Mikael Morin [Wed, 31 Aug 2022 09:00:45 +0000 (11:00 +0200)]
fortran: Move the clobber generation code
This change inlines the clobber generation code from
gfc_conv_expr_reference to the single caller from where the add_clobber
flag can be true, and removes the add_clobber argument.
What motivates this is the standard making the procedure call a cause
for a variable to become undefined, which translates to a clobber
generation, so clobber generation should be closely related to procedure
call generation, whereas it is rather orthogonal to variable reference
generation. Thus the generation of the clobber feels more appropriate
in gfc_conv_procedure_call than in gfc_conv_expr_reference.
Behaviour remains unchanged.
gcc/fortran/ChangeLog:
* trans.h (gfc_conv_expr_reference): Remove add_clobber
argument.
* trans-expr.cc (gfc_conv_expr_reference): Ditto. Inline code
depending on add_clobber and conditions controlling it ...
(gfc_conv_procedure_call): ... to here.
arm: Fix constant immediates predicates and constraints for some MVE builtins
Several MVE builtins incorrectly use the same predicate/constraint
pair for several modes, which does not match the specification.
This patch uses the appropriate iterator instead.
Richard Biener [Wed, 14 Sep 2022 07:00:35 +0000 (09:00 +0200)]
tree-optimization/106934 - avoid BIT_FIELD_REF of bitfields
The following avoids creating BIT_FIELD_REF of bitfields in
update-address-taken. The patch doesn't implement punning to
a full precision integer type but leaves a comment according to
that.
Richard Biener [Thu, 15 Sep 2022 11:33:23 +0000 (13:33 +0200)]
tree-optimization/106922 - PRE and virtual operand translation
PRE implicitely keeps virtual operands at the blocks incoming version
but the explicit updating point during PHI translation fails to trigger
when there are no PHIs at all in a block. Later lazy updating then
fails because of a too lose block check. A similar issues plagues
reference invalidation when checking the ANTIC_OUT to ANTIC_IN
translation. The following fixes both and makes the lazy updating
work.
The diagnostic testcase unfortunately requires boost so the
testcase is the one I reduced for a missed optimization in PRE.
The testcase fails with -m32 on x86_64 because we optimize too
much before PRE which causes PRE to not trigger so we fail to
eliminate a full redundancy. I'm going to open a separate bug
for this. Hopefully the !lp64 selector is good enough.
PR tree-optimization/106922
* tree-ssa-pre.cc (translate_vuse_through_block): Only
keep the VUSE if its def dominates PHIBLOCK.
(prune_clobbered_mems): Rewrite logic so we check whether
a value dies in a block when the VUSE def doesn't dominate it.
Richard Biener [Fri, 9 Sep 2022 10:06:38 +0000 (12:06 +0200)]
tree-optimization/106892 - avoid invalid pointer association in predcom
When predictive commoning builds a reference for iteration N it
prematurely associates a constant offset into the MEM_REF offset
operand which can be invalid if the base pointer then points
outside of an object which alias-analysis does not consider valid.
PR tree-optimization/106892
* tree-predcom.cc (ref_at_iteration): Do not associate the
constant part of the offset into the MEM_REF offset
operand, across a non-zero offset.
The following avoids adding PHIs to the worklist for uninit processing
if we reach them following backedges. That confuses predicate analysis
because it assumes the use is happening in the same iteration as the the
definition. For the testcase in the PR the situation is like
void foo (int val)
{
int uninit;
# val = PHI <..> (B)
for (..)
{
if (..)
{
.. = val; (C)
val = uninit;
}
# val = PHI <..> (A)
}
}
and starting from (A) with 'uninit' as argument we arrive at (B)
and from there at (C). Predicate analysis then tries to prove
the predicate of (B) (not the backedge) can prove that the
path from (B) to (C) is unreachable which isn't really what it
necessary - that's what we'd need to do when the preheader
edge of the loop were the edge with the uninitialized def.
So the following makes those cases intentionally false negatives.
PR tree-optimization/105937
* tree-ssa-uninit.cc (find_uninit_use): Do not queue PHIs
on backedges.
(execute_late_warn_uninitialized): Mark backedges.
Fortran: Fix ICE and wrong code for assumed-rank arrays [PR100029, PR100040]
gcc/fortran/ChangeLog:
PR fortran/100040
PR fortran/100029
* trans-expr.cc (gfc_conv_class_to_class): Add code to have
assumed-rank arrays recognized as full arrays and fix the type
of the array assignment.
(gfc_conv_procedure_call): Change order of code blocks such that
the free of ALLOCATABLE dummy arguments with INTENT(OUT) occurs
first.
gcc/testsuite/ChangeLog:
PR fortran/100029
* gfortran.dg/PR100029.f90: New test.
PR fortran/100040
* gfortran.dg/PR100040.f90: New test.
Jason Merrill [Thu, 29 Sep 2022 17:45:02 +0000 (13:45 -0400)]
c++: fix triviality of class with unsatisfied op=
cxx20_pair is trivially copyable because it has a trivial copy constructor
and only a deleted copy assignment operator; the non-triviality of the
unsatisfied copy assignment overload is not considered.
Harald Anlauf [Tue, 27 Sep 2022 18:54:28 +0000 (20:54 +0200)]
Fortran: error recovery while simplifying intrinsic UNPACK [PR107054]
gcc/fortran/ChangeLog:
PR fortran/107054
* simplify.cc (gfc_simplify_unpack): Replace assert by condition
that terminates simplification when there are not enough elements
in the constructor of argument VECTOR.
gcc/testsuite/ChangeLog:
PR fortran/107054
* gfortran.dg/pr107054.f90: New test.
H.J. Lu [Tue, 27 Sep 2022 23:19:11 +0000 (16:19 -0700)]
i386: Mark XMM4-XMM6 as clobbered by encodekey128/encodekey256
encodekey128 and encodekey256 operations clear XMM4-XMM6. But it is
documented that XMM4-XMM6 are reserved for future usages and software
should not rely upon them being zeroed. Change encodekey128 and
encodekey256 to clobber XMM4-XMM6.
This patch adds -mcpu/-mtune support for the Arm Neoverse V2 core.
This updates the internal references to "demeter", but leaves "demeter" as an
accepted value to -mcpu/-mtune as it appears in the released GCC 12 series.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (neoverse-v2): New entry.
(demeter): Update tunings to neoversev2.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (demeter_addrcost_table): Rename to
neoversev2_addrcost_table.
(demeter_regmove_cost): Rename to neoversev2_addrcost_table.
(demeter_advsimd_vector_cost): Rename to neoversev2_advsimd_vector_cost.
(demeter_sve_vector_cost): Rename to neoversev2_sve_vector_cost.
(demeter_scalar_issue_info): Rename to neoversev2_scalar_issue_info.
(demeter_advsimd_issue_info): Rename to neoversev2_advsimd_issue_info.
(demeter_sve_issue_info): Rename to neoversev2_sve_issue_info.
(demeter_vec_issue_info): Rename to neoversev2_vec_issue_info.
Update references to above.
(demeter_vector_cost): Rename to neoversev2_vector_cost.
(demeter_tunings): Rename to neoversev2_tunings.
(aarch64_vec_op_count::rename_cycles_per_iter): Use
neoversev2_sve_issue_info instead of demeter_sve_issue_info.
* doc/invoke.texi (AArch64 Options): Document neoverse-v2.
It turns out that GTY(()) markers in definitions like:
GTY(()) tree scalar_types[NUM_VECTOR_TYPES];
are not effective and are silently ignored. The GTY(()) has
to come after an extern or static.
The externs associated with the SVE ACLE GTY variables are in
aarch64-sve-builtins.h. This file is not in tm_include_list because
we don't want every target-facing file to include it. It therefore
isn't in the list of GC header files either.
In this case that's a blessing in disguise, since the variables
belong to a namespace and gengtype doesn't understand namespaces.
I think the fix is instead to add an extra extern before each
variable declaration, similarly to varasm.cc and vtable-verify.cc.
(This works due to a "using namespace" at the end of the file.)
gcc/
PR target/106491
* config/aarch64/aarch64-sve-builtins.cc (scalar_types)
(acle_vector_types, acle_svpattern, acle_svprfop): Add GTY
markup to (new) extern declarations instead of to the main
definition.
Kewen Lin [Tue, 13 Sep 2022 09:14:23 +0000 (04:14 -0500)]
rs6000: Fix the check of bif argument number [PR104482]
As PR104482 shown, it's one regression about the handlings when
the argument number is more than the one of built-in function
prototype. The new bif support only catches the case that the
argument number is less than the one of function prototype, but
it misses the case that the argument number is more than the one
of function prototype. Because it uses "n != expected_args",
n is updated in
for (n = 0; !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
fnargs = TREE_CHAIN (fnargs), n++)
, it's restricted to be less than or equal to expected_args with
the guard !VOID_TYPE_P (TREE_VALUE (fnargs)), so it's wrong.
The fix is to use nargs instead, also move the checking hunk's
location ahead to avoid useless further scanning when the counts
mismatch.
PR target/104482
gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Fix
the equality check for argument number, and move this hunk ahead.
PR105485 exposes that new builtin function framework doesn't handle
unresolved overloaded builtin function well. With new builtin
function support, we don't have builtin info for any overloaded
rs6000_gen_builtins enum, since they are expected to be resolved to
one specific instance. So when function rs6000_gimple_fold_builtin
faces one unresolved overloaded builtin, the access for builtin info
becomes out of bound and gets ICE then.
We should not try to fold one unresolved overloaded builtin there
and as the previous support we should emit one error message during
expansion phase like "unresolved overload for builtin ...".
PR target/105485
gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Add
the handling for unresolved overloaded builtin function.
(rs6000_expand_builtin): Likewise.
Fix PR target/99184: Wrong cast from double to 16-bit and 32-bit ints
this patch fixed PR target/99184 which incorrectly rounded during 64-bit
(long) double to 16-bit and 32-bit integers.
The patch just removes the respective roundings from
libf7-asm.sx::to_integer and ::to_unsigned. Luckily, LibF7 does nowhere
use respective functions internally, the only user is in libf7.c::f7_exp
which reads
f7_round (qq, qq);
int16_t q = f7_get_s16 (qq);
so that f7_get_s16() operates on an already rounded value, and therefore
this code works unaltered with or without rounding in to_integer.
Ian Lance Taylor [Tue, 13 Sep 2022 17:33:20 +0000 (10:33 -0700)]
runtime: ignore __morestack function in runtime.Callers
Backport from trunk.
We were ignoring all functions starting with "__morestack_", but not
the function "__morestack" itself. Without this change, some tests
such as recover.go started failing recently, though I'm not sure
exactly what changed.
Kewen Lin [Wed, 7 Sep 2022 01:37:57 +0000 (20:37 -0500)]
rs6000/test: Fix empty TU in some cases of effective targets [PR106345]
As the failure of test case gcc.target/powerpc/pr92398.p9-.c in
PR106345 shows, some test sources for some powerpc effective
targets use empty translation unit wrongly. The test sources
could go with options like "-ansi -pedantic-errors", then those
effective target checkings will fail unexpectedly with the
error messages like:
error: ISO C forbids an empty translation unit [-Wpedantic]
This patch is to fix empty TUs with one dummy function definition
accordingly.
PR testsuite/106345
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_powerpc_sqrt): Add
a function definition to avoid pedwarn about empty translation unit.
(check_effective_target_has_arch_pwr5): Likewise.
(check_effective_target_has_arch_pwr6): Likewise.
(check_effective_target_has_arch_pwr7): Likewise.
(check_effective_target_has_arch_pwr8): Likewise.
(check_effective_target_has_arch_pwr9): Likewise.
(check_effective_target_has_arch_pwr10): Likewise.
(check_effective_target_has_arch_ppc64): Likewise.
(check_effective_target_ppc_float128): Likewise.
(check_effective_target_ppc_float128_insns): Likewise.
(check_effective_target_powerpc_vsx): Likewise.
Although PR106320 affected only the 10 and 11 branches, and the testcase
from there is already correctly accepted on trunk and the 12 branch, we
still should add the testcase to trunk/12 too for inter-branch consistency.
PR libstdc++/106320
libstdc++-v3/ChangeLog:
* testsuite/std/ranges/adaptors/join.cc (test13): New test.
Richard Biener [Wed, 7 Sep 2022 08:44:33 +0000 (10:44 +0200)]
tree-optimization/106860 - fix profile scaling in split_loop
The following fixes a mistake in loop splitting which assumes loop
latches have a single predecessor and that edge is from the exit
test. Instead work from the single exit edge we have to find the
edge towards the latch.
PR tree-optimization/106860
* tree-ssa-loop-split.cc (split_loop): Find the exit to
latch edge from the loop exit edge instead of from the
latch. Verify we're going to find it.
Richard Biener [Fri, 2 Sep 2022 11:36:13 +0000 (13:36 +0200)]
tree-optimization/106809 - compile time hog in VN
The dominated_by_p_w_unex function is prone to high compile time.
With GCC 12 we introduced a VN run for uninit diagnostics which now
runs into a degenerate case with bison generated code. Fortunately
this case is easy to fix with a simple extra check - a more
general fix needs more work.
PR tree-optimization/106809
* tree-ssa-sccvn.cc (dominaged_by_p_w_unex): Check we have
more than one successor before doing extra work.
Jonathan Wakely [Mon, 22 Aug 2022 14:42:17 +0000 (15:42 +0100)]
libstdc++: Fix for explicit copy ctors in <thread> and <future> [PR106695]
When I changed std::thread and std::async to avoid unnecessary move
construction of temporaries, I introduced a regression where types with
an explicit copy constructor could not be passed to std::thread or
std::async. The fix is to add a constructor instead of using aggregate
initialization of an unnamed temporary.
libstdc++-v3/ChangeLog:
PR libstdc++/106695
* include/bits/std_thread.h (thread::_State_impl): Forward
individual arguments to _Invoker constructor.
(thread::_Invoker): Add constructor. Delete copies.
* include/std/future (__future_base::_Deferred_state): Forward
individual arguments to _Invoker constructor.
(__future_base::_Async_state_impl): Likewise.
* testsuite/30_threads/async/106695.cc: New test.
* testsuite/30_threads/thread/106695.cc: New test.
Jonathan Wakely [Mon, 22 Aug 2022 14:16:16 +0000 (15:16 +0100)]
libstdc++: Check for overflow in regex back-reference [PR106607]
Currently we fail to notice integer overflow when parsing a
back-reference expression, or when converting the parsed result from
long to int. This changes the result to be int, so no conversion is
needed, and uses the overflow-checking built-ins to detect an
out-of-range back-reference.
libstdc++-v3/ChangeLog:
PR libstdc++/106607
* include/bits/regex_compiler.tcc (_Compiler::_M_cur_int_value):
Use built-ins to check for integer overflow in back-reference
number.
* testsuite/28_regex/basic_regex/106607.cc: New test.
Peter Bergner [Thu, 1 Sep 2022 02:14:36 +0000 (21:14 -0500)]
rs6000: Don't ICE when we disassemble an MMA variable [PR101322]
When we expand an MMA disassemble built-in with C++ using a pointer that
is cast to a valid MMA type, the type isn't passed down to the expand
machinery and we end up using the base type of the pointer which leads to
an ICE. This patch enforces we always use the correct MMA type regardless
of the pointer type being used.
2022-08-31 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/101322
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin):
Enforce the use of a valid MMA pointer type.
gcc/testsuite/
PR target/101322
* g++.target/powerpc/pr101322.C: New test.
cselib: add function to check if SET is redundant [PR106187]
A SET operation that writes memory may have the same value as an
earlier store but if the alias sets of the new and earlier store do
not conflict then the set is not truly redundant. This can happen,
for example, if objects of different types share a stack slot.
To fix this we define a new function in cselib that first checks for
equality and if that is successful then finds the earlier store in the
value history and checks the alias sets.
The routine is used in two places elsewhere in the compiler:
cfgcleanup and postreload.
gcc/ChangeLog:
PR rtl-optimization/106187
* alias.h (mems_same_for_tbaa_p): Declare.
* alias.cc (mems_same_for_tbaa_p): New function.
* dse.cc (record_store): Use it instead of open-coding
alias check.
* cselib.h (cselib_redundant_set_p): Declare.
* cselib.cc: Include alias.h
(cselib_redundant_set_p): New function.
* cfgcleanup.cc: (mark_effect): Use cselib_redundant_set_p instead
of rtx_equal_for_cselib_p.
* postreload.cc (reload_cse_simplify): Use cselib_redundant_set_p.
(reload_cse_noop_set_p): Delete.
Richard Earnshaw [Wed, 11 May 2022 12:08:40 +0000 (13:08 +0100)]
arm: correctly handle misaligned MEMs on MVE [PR105463]
Vector operations in MVE must be aligned to the element size, so if we
are asked for a misaligned move in a wider mode we must recast it to a
form suitable for the known alignment (larger elements have better
address offset ranges, so there is some advantage to using wider
element sizes if possible). Whilst fixing this, also rework the
predicates used for validating operands - the Neon predicates are
not right for MVE.
gcc/ChangeLog:
PR target/105463
* config/arm/mve.md (*movmisalign<mode>_mve_store): Use
mve_memory_operand.
(*movmisalign<mode>_mve_load): Likewise.
* config/arm/vec-common.md (movmisalign<mode>): Convert to generator
form...
(@movmisalign<mode>): ... thus. Use generic predicates and then
rework operands if they are not valid. For MVE rework to a
narrower element size if the alignment is not high enough.
Jakub Jelinek [Thu, 1 Sep 2022 09:07:44 +0000 (11:07 +0200)]
Fix up dump_printf_loc format attribute and adjust uses [PR106782]
As discussed on IRC, the r13-2299-g68c61c2daa1f bug only got missed
because dump_printf_loc had incorrect format attribute and therefore
almost no -Wformat=* checking was performed on it.
3, 0 are suitable for function with (whatever, whatever, const char *, va_list)
arguments, not for (whatever, whatever, const char *, ...), that one should
use 3, 4.
There are 3 spots where the mismatch was worse though, two using %u or %d
for unsigned HOST_WIDE_INT argument and one %T for enum argument (promoted
to int) and this backport just fixes those spots.
2022-09-01 Jakub Jelinek <jakub@redhat.com>
PR other/106782
* tree-vect-slp.cc (vect_print_slp_tree): Use
HOST_WIDE_INT_PRINT_UNSIGNED instead of %u.
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use
HOST_WIDE_INT_PRINT_UNSIGNED instead of %d.
* tree-vect-slp-patterns.cc (vect_pattern_validate_optab): Use %G
instead of %T and STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node))
instead of SLP_TREE_DEF_TYPE (node).
Marek Polacek [Mon, 29 Aug 2022 20:54:05 +0000 (16:54 -0400)]
c++: __has_builtin gives the wrong answer [PR106759]
We've supported __is_nothrow_constructible since r11-4386, but
names_builtin_p didn't know about it, so it gave the wrong answer for
#if __has_builtin(__is_nothrow_constructible)
...
#endif
I've tested all C++-only built-ins and only two were missing.
PR c++/106759
gcc/cp/ChangeLog:
* cp-objcp-common.cc (names_builtin_p): Handle RID_IS_NOTHROW_ASSIGNABLE
and RID_IS_NOTHROW_CONSTRUCTIBLE.
Peter Bergner [Sun, 28 Aug 2022 00:44:16 +0000 (19:44 -0500)]
rs6000: Allow conversions of MMA pointer types [PR106017]
GCC incorrectly disables conversions between MMA pointer types, which
are allowed with clang. The original intent was to disable conversions
between MMA types and other other types, but pointer conversions should
have been allowed. The fix is to just remove the MMA pointer conversion
handling code altogether.
H.J. Lu [Thu, 18 Aug 2022 21:17:33 +0000 (14:17 -0700)]
x86: Cast stride to __PTRDIFF_TYPE__ in AMX intrinsics
On 64-bit Windows, long is 32 bits and can't be used as stride in memory
operand when base is a pointer which is 64 bits. Cast stride to
__PTRDIFF_TYPE__, instead of long.