Jonathan Wakely [Wed, 6 Dec 2023 13:39:52 +0000 (13:39 +0000)]
libstdc++: Make __gnu_debug::vector usable in constant expressions [PR109536]
This makes constexpr std::vector (mostly) work in Debug Mode. All safe
iterator instrumentation and checking is disabled during constant
evaluation, because it requires mutex locks and calls to non-inline
functions defined in libstdc++.so. It should be OK to disable the safety
checks, because most UB should be detected during constant evaluation
anyway.
We could try to enable the full checking in constexpr, but it would mean
wrapping all the non-inline functions like _M_attach with an inline
_M_constexpr_attach that does the iterator housekeeping inline without
mutex locks when called for constant evaluation, and calls the
non-inline function at runtime. That could be done in future if we find
that we've lost safety or useful checking by disabling the safe
iterators.
There are a few test failures in C++20 mode, which I'm unable to
explain. The _Safe_iterator::operator++() member gives errors for using
non-constexpr functions during constant evaluation, even though those
functions are guarded by std::is_constant_evaluated() checks. The same
code works fine for C++23 and up.
Richard Biener [Thu, 14 Dec 2023 15:00:50 +0000 (16:00 +0100)]
tree-optimization/113018 - ICE with BB reduction vectorization
When BB reduction vectorization picks up a chain with an ASM def
in it and that's inside the vectorized region we fail to get its
LHS. Instead of trying to get the correct def the following
avoids vectorizing such def and instead keeps it as def to add
in the epilog.
PR tree-optimization/113018
* tree-vect-slp.cc (vect_slp_check_for_roots): Only start
SLP discovery from stmts with a LHS.
Patrick Palka [Thu, 14 Dec 2023 15:18:48 +0000 (10:18 -0500)]
c++: Implement P2582R1, CTAD from inherited constructors
This patch implements C++23 class template argument deduction from
inherited constructors, the mechanism for which relies on alias
CTAD which we already fully support. The process for transforming
the return type of an inherited guide is specified in terms of a
partially specialized class template, but this patch implements it in
a simpler way, effectively performing ahead of time deduction instead
of instantiation time deduction. I wasn't able to find an example for
which this implementation strategy makes a difference, but I didn't
look very hard. Support seems good enough to advertise as complete
but there doesn't seem to be a feature-test macro update for this
feature yet. There should be no functional change before C++23 mode.
There's a couple of FIXMEs, one in inherited_ctad_tweaks for recognizing
more forms of inherited constructors, and one in deduction_guides_for for
making the cache aware of base-class dependencies.
gcc/cp/ChangeLog:
* cp-tree.h (type_targs_deducible_from): Adjust return type.
* pt.cc (alias_ctad_tweaks): Also handle C++23 inherited CTAD.
(inherited_ctad_tweaks): Define.
(type_targs_deducible_from): Return the deduced arguments or
NULL_TREE instead of a bool. Handle 'tmpl' being a TREE_LIST
representing a synthetic alias template.
(ctor_deduction_guides_for): Do inherited_ctad_tweaks for each
USING_DECL in C++23 mode.
(deduction_guides_for): Add FIXME for stale cache entries in
light of inherited CTAD.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction67.C: Accept in C++23 mode.
* g++.dg/cpp23/class-deduction-inherited1.C: New test.
* g++.dg/cpp23/class-deduction-inherited2.C: New test.
* g++.dg/cpp23/class-deduction-inherited3.C: New test.
* g++.dg/cpp23/class-deduction-inherited4.C: New test.
Richard Biener [Wed, 13 Dec 2023 13:23:31 +0000 (14:23 +0100)]
tree-optimization/112793 - SLP of constant/external code-generated twice
The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.
PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.
David Malcolm [Thu, 14 Dec 2023 14:10:10 +0000 (09:10 -0500)]
analyzer: cleanups [PR112655]
Avoid copying eedges in infinite_loop::infinite_loop.
Use initializer lists in the various places reported in
PR analyzer/112655 (apart from coord_test's ctor, which
would require nontrivial refactoring).
gcc/analyzer/ChangeLog:
PR analyzer/112655
* infinite-loop.cc (infinite_loop::infinite_loop): Pass eedges
via rvalue reference rather than by value.
(starts_infinite_loop_p): Move eedges when constructing an
infinite_loop instance.
* sm-file.cc (fileptr_state_machine::fileptr_state_machine): Use
initializer list for states.
* sm-sensitive.cc
(sensitive_state_machine::sensitive_state_machine): Likewise.
* sm-signal.cc (signal_state_machine::signal_state_machine):
Likewise.
* sm-taint.cc (taint_state_machine::taint_state_machine):
Likewise.
* varargs.cc (va_list_state_machine::va_list_state_machine): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
aarch64: Improve handling of accumulators in early-ra
Being very simplistic, early-ra just models an allocno's live range
as a single interval. This doesn't work well for single-register
accumulators that are updated multiple times in a loop, since in
SSA form, each intermediate result will be a separate SSA name and
will remain separate from the accumulator even after out-of-ssa.
This means that in something like:
for (;;)
{
x = x + ...;
x = x + ...;
}
the first definition of x and the second use will be a separate pseudo
from the "main" loop-carried pseudo.
A real RA would fix this by keeping general, segmented live ranges.
But that feels like a slippery slope in this context.
This patch instead looks for sharability at a more local level,
as described in the comments. It's a bit hackish, but hopefully
not too much.
The patch also contains some small tweaks that are needed to make
the new and existing tests pass:
- fix a case where a pseudo that was only moved was wrongly treated
as not an FPR candidate
- fix some bookkeeping related to is_strong_copy_src
- use the number of FPR preferences as a tiebreaker when sorting colors
I fully expect that we'll need to be more aggressive at skipping the
early-ra allocation. For example, it probably makes sense to refuse any
allocation that involves an FPR move. But I'd like to keep collecting
examples of where things go wrong first, so that hopefully we can improve
the cases with strided registers or structures.
gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New
member variable.
(allocno_info::equiv_allocno): Replace with...
(allocno_info::related_allocno): ...this member variable.
(allocno_info::chain_prev): Put into an enum with...
(allocno_info::last_use_point): ...this new member variable.
(color_info::num_fpr_preferences): New member variable.
(early_ra::m_shared_allocnos): Likewise.
(allocno_info::is_shared): New member function.
(allocno_info::is_equiv_to): Likewise.
(early_ra::dump_allocnos): Dump sharing information. Tweak column
widths.
(early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2.
(early_ra::start_new_region): Handle m_shared_allocnos.
(early_ra::create_allocno_group): Set related_allocno rather than
equiv_allocno.
(early_ra::record_allocno_use): Likewise. Detect multiple calls
for the same program point. Update last_use_point and is_equiv.
Clear is_strong_copy_src rather than is_strong_copy_dest.
(early_ra::record_allocno_def): Use related_allocno rather than
equiv_allocno. Update last_use_point.
(early_ra::valid_equivalence_p): Replace with...
(early_ra::find_related_start): ...this new function.
(early_ra::record_copy): Look for cases where a destination copy chain
can be shared with the source allocno.
(early_ra::find_strided_accesses): Update for equiv_allocno->
related_allocno change. Only call consider_strong_copy_src_chain
at the head of a copy chain.
(early_ra::is_chain_candidate): Skip shared allocnos. Update for
new representation of equivalent allocnos.
(early_ra::chain_allocnos): Update for new representation of
equivalent allocnos.
(early_ra::try_to_chain_allocnos): Likewise.
(early_ra::merge_fpr_info): New function, split out from...
(early_ra::set_single_color_rep): ...here.
(early_ra::form_chains): Handle shared allocnos.
(early_ra::process_copies): Count the number of FPR preferences.
(early_ra::cmp_decreasing_size): Rename to...
(early_ra::cmp_allocation_order): ...this. Sort equal-sized groups
by the number of FPR preferences.
(early_ra::finalize_allocation): Handle shared allocnos.
(early_ra::process_region): Reset chain_prev as well as chain_next.
gcc/testsuite/
* gcc.target/aarch64/sve/accumulators_1.c: New test.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to
be in any order.
* gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise.
Alexandre Oliva [Thu, 14 Dec 2023 13:41:13 +0000 (10:41 -0300)]
strub: handle volatile promoted args in internal strub [PR112938]
When generating code for an internal strub wrapper, don't clear the
DECL_NOT_GIMPLE_REG_P flag of volatile args, and gimplify them both
before and after any conversion.
While at that, move variable TMP into narrower scopes so that it's
more trivial to track where ARG lives.
Jeff Law [Thu, 14 Dec 2023 13:31:49 +0000 (06:31 -0700)]
[committed] Fix m68k testcase for c99
More fallout from the c99 conversion. The m68k specific test pr63347.c calls
exit and abort without a prototype in scope. This patch turns them into
__builtin calls avoiding the error.
Bootstrapped and regression tested on m68k-linux-gnu, pushed to the trunk.
gcc/testsuite
* gcc.target/m68k/pr63347.c: Call __builtin_abort and __builtin_exit
instead of abort and exit.
Thomas Schwinge [Thu, 14 Dec 2023 13:12:45 +0000 (14:12 +0100)]
In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>'
... to avoid issues such as:
In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
[...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
return malloc (size);
^
make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1
Jakub Jelinek [Thu, 14 Dec 2023 11:06:59 +0000 (12:06 +0100)]
match.pd: Simplify (t * u) / (t * v) [PR112994]
On top of the previously posted patch, this simplifies say (x * 16) / (x * 4)
into 4. Unlike the previous pattern, this is something we didn't fold
previously on GENERIC, so I think it shouldn't be all wrapped with #if
GIMPLE. The question whether there should be fold_overflow_warning for the
TYPE_OVERFLOW_UNDEFINED case remains.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112994
* match.pd ((t * u) / (t * v) -> (u / v)): New simplification.
Jakub Jelinek [Thu, 14 Dec 2023 10:55:49 +0000 (11:55 +0100)]
match.pd: Simplify (t * u) / v -> t * (u / v) [PR112994]
The following testcase is optimized just on GENERIC (using
strict_overflow_p = false;
if (TREE_CODE (arg1) == INTEGER_CST
&& (tem = extract_muldiv (op0, arg1, code, NULL_TREE,
&strict_overflow_p)) != 0)
{
if (strict_overflow_p)
fold_overflow_warning (("assuming signed overflow does not occur "
"when simplifying division"),
WARN_STRICT_OVERFLOW_MISC);
return fold_convert_loc (loc, type, tem);
}
) but not on GIMPLE.
An earlier version of the patch regressed
+FAIL: gcc.dg/Wstrict-overflow-3.c correct warning (test for warnings, line 12)
test, we are indeed assuming that signed overflow does not occur
when simplifying division in there.
This version of the patch (which provides the simplification only
for GIMPLE) fixes that.
And/or we could add the
fold_overflow_warning (("assuming signed overflow does not occur "
"when simplifying division"),
WARN_STRICT_OVERFLOW_MISC);
call into the simplification, but in that case IMHO it should go into
the (t * u) / u -> t simplification as well, there we assume the exact
same thing (of course, in both cases only in the spots where we don't
verify it through ranger that it never overflows).
Guarding the whole simplification to GIMPLE only IMHO makes sense because
the above mentioned folding does it for GENERIC (and extract_muldiv even
handles far more cases, dunno how many from that we should be doing on
GIMPLE in match.pd and what could be done elsewhere; e.g. extract_muldiv
can handle (x * 16 + y * 32) / 8 -> x * 2 + y * 4 etc.).
Dunno about the fold_overflow_warning, I always have doubts about why
such a warning is useful to users.
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112994
* match.pd ((t * 2) / 2 -> t): Adjust comment to use u instead of 2.
Punt without range checks if TYPE_OVERFLOW_SANITIZED.
((t * u) / v -> t * (u / v)): New simplification.
Filip Kastl [Thu, 14 Dec 2023 10:29:31 +0000 (11:29 +0100)]
A new copy propagation and PHI elimination pass
This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:
_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16
It also handles more complicated situations, e.g.:
_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1
gcc/ChangeLog:
* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.
Martin Jambor [Thu, 14 Dec 2023 10:09:06 +0000 (11:09 +0100)]
SRA: Relax requirements to use build_reconstructed_reference (PR 111807)
This patch half-reverts 3aaf704bca3e and replaces it with a fix with
relaxed requiremets for invoking build_reconstructed_reference in
build_ref_for_model.
build_ref_for_model/build_ref_for_offset is used in two slightly
different contexts. The first is when we are looking at an assignmernt
like
p->field_A.field_B = s.field_B;
and we have a replacements for e.g. s.field_B.field_C.field_D and we
want to store them directly to p->field_A.field_B.field_C.field_D (as
opposed to going through s or using a MEM_REF based in
p->field_A.field_B). In this case, the offset of the
"model" (s.field_B.field_C.field_D) within this can be different than
offset within the LHS that we want to reach (field_C.field_D within
the "base" p->field_A.field_B). Patch 3aaf704bca3e has caused us to
unnecessarily create MEM_REFs for these situations. These uses of
build_ref_for_model work with the relaxed condition just fine.
The second, problematic, context is when somewhere in the function we
have an assignment
s.field_A = t.field_A.field_B;
and we are creating an access structure to represent s.field_A.field_B
even if it is not actually accessed in the original input. This is
done after scanning the entire function body and we need to construct
a "universal" reference to s.field_A.field_B. In this case the "base"
is "s" and it has to be the DECL itself and not some reference for it
because for arbitrary references we need a GSI pointing to a statement
which we don't have, the reference is supposed to be universal.
But then using build_ref_for_model and within it
build_reconstructed_reference misbihaves if the expression contains
any ARRAY_REFs. In the first case those are fine because as we
eventually reach the aggregate type that matches a real LHS or RHS, we
know we we can just bolt the rest of the references onto it and end up
with the correct overall reference. However when dealing with
s.array[1].field_A = s.array[2].field_B;
we cannot just bolt array[2] reference when we want array[1] but that
is exactly what happens when we use build_reconstructed_reference and
keep it walking all the way to s.
I was consiering making all users of the second kind use directly
build_ref_for_offset instead of build_ref_for_model but the latter
also handles COMPONENT_REFs to bit-fields which the former does not.
THerefore I have deided to use the NULL-ness of GSI as an indicator
how strict we need to be. I have changed the function comment to
reflect that.
I have been able to observe diambiguation improvements with this patch
over currenct master, we do successfuly manage a few more
aliasing_component_refs_p disambiguations when compiling cc1, going
from:
Alias oracle query stats:
refs_may_alias_p: 94354287 disambiguations, 106279231 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407309 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
TBAA oracle: 22675296 disambiguations 61781978 queries 14045969 are in alias set 0 10997085 queries asked about the same object
153 queries asked about the same alias set
0 access volatile 12485774 are dependent in the DAG 1577701 are aritificially in conflict with void *
Alias oracle query stats:
refs_may_alias_p: 94354083 disambiguations, 106278948 queries
ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries
call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
stmt_kills_ref_p: 142342 kills, 8407310 queries
nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
aliasing_component_refs_p: 67104 disambiguations, 3081781 queries
TBAA oracle: 22676608 disambiguations 61782455 queries 14044948 are in alias set 0 10998619 queries asked about the same object
153 queries asked about the same alias set
0 access volatile 12484882 are dependent in the DAG 1577245 are aritificially in conflict with void *
PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Allow offset smaller than
model->offset when gsi is non-NULL. Adjust function comment.
liuhongt [Wed, 13 Dec 2023 03:20:46 +0000 (11:20 +0800)]
Force broadcast constant to mem for vec_dup{v4di,v8si,v4df,v8df} when TARGET_AVX2 is not available.
vpbroadcastd/vpbroadcastq is avaiable under TARGET_AVX2, but
vec_dup{v4di,v8si} pattern is avaiable under AVX with memory operand.
And it will cause LRA/Reload to generate spill and reload if we put
constant in register.
gcc/ChangeLog:
PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Don't convert to
broadcast for vec_dup{v4di,v8si} when TARGET_AVX2 is not
available.
(ix86_broadcast_from_constant): Allow broadcast for V4DI/V8SI
when !TARGET_AVX2 since it will be forced to memory later.
(ix86_expand_vector_move): Force constant to mem for
vec_dup{vssi,v4di} when TARGET_AVX2 is not available.
Jakub Jelinek [Thu, 14 Dec 2023 07:01:04 +0000 (08:01 +0100)]
testsuite: Fix up pr112904.C test [PR112904]
On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote:
> * g++.target/i386/pr112904.C: New test.
The new test FAILs on i686-linux and even on x86_64-linux I think
it doesn't actually test what was reported, unless one performs testing
with -march= for some XOP enabled CPU or -mxop.
The following patch fixes that, tested on x86_64-linux with
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} i386.exp=pr112904.C'
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR target/112904
* g++.target/i386/pr112904.C: Add dg-do compile, dg-options -mxop
and for ia32 also dg-additional-options -mmmx.
Jakub Jelinek [Thu, 14 Dec 2023 06:57:34 +0000 (07:57 +0100)]
c++: Fix tinst_level::to_list [PR112968]
With valgrind checking, there are various errors reported on some C++26
libstdc++ tests, like:
==2009913== Conditional jump or move depends on uninitialised value(s)
==2009913== at 0x914C59: gt_ggc_mx_lang_tree_node(void*) (gt-cp-tree.h:107)
==2009913== by 0x8AB7A5: gt_ggc_mx_tinst_level(void*) (gt-cp-pt.h:32)
==2009913== by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const*) (ggc-common.cc:75)
==2009913== by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104)
==2009913== by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227)
==2009913== by 0xDB70F6: execute_one_pass(opt_pass*) (passes.cc:2738)
==2009913== by 0xDB721F: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==2009913== by 0xDB7258: execute_pass_list(function*, opt_pass*) (passes.cc:2766)
==2009913== by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695)
==2009913== by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248)
==2009913== by 0xA5890D: symbol_table::finalize_compilation_unit() (cgraphunit.cc:2555)
==2009913== by 0xEB02A1: compile_file() (toplev.cc:473)
I think the problem is in the tinst_level::to_list optimization from 2018.
That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in.
Either it freshly allocates using build_tree_list (NULL, NULL); + stores
TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object
is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE
modified later; the above also means in particular TREE_TYPE of it is NULL
and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind
annotations.
Or it grabs a TREE_LIST node from a freelist.
If defined(ENABLE_GC_CHECKING), the object is still all zeros except
for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case
(but unlike the build_tree_list case in the valgrind annotations
TREE_TYPE and TREE_CHAIN are marked as uninitialized).
If !defined(ENABLE_GC_CHECKING), I believe the actual memory content
is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is
zeros and TREE_CHAIN is something random (whatever next entry is in the
freelist, nothing overwrote it) and from valgrind POV again,
TREE_TYPE and TREE_CHAIN are marked as uninitialized.
When using the other freelist instantiations (pending_template and
tinst_level) I believe everything is correct, from valgrind POV it marks
the whole pending_template or tinst_level as uninitialized, but the
caller initializes it all).
One way to fix this would be let tinst_level::to_list not store just
TREE_PURPOSE (ret) = tldcl;
TREE_VALUE (ret) = targs;
but also
TREE_TYPE (ret) = NULL_TREE;
TREE_CHAIN (ret) = NULL_TREE;
Though, that seems like wasted effort in the build_tree_list case to me.
So, the following patch instead does that TREE_CHAIN = NULL_TREE store only
in the case where it isn't already done (and likewise for TREE_TYPE just to
be sure) and marks both TREE_CHAIN and TREE_TYPE as initialized (the latter
is at that spot, the former is because we never really touch TREE_TYPE of a
TREE_LIST anywhere and so the NULL gets stored into the freelist and
restored from there (except for ENABLE_GC_CHECKING where it is poisoned
and then cleared again).
2023-12-14 Jakub Jelinek <jakub@redhat.com>
PR c++/112968
* pt.cc (freelist<tree_node>::reinit): Make whole obj->common
defined for valgrind annotations rather than just obj->base,
and do it even for ENABLE_GC_CHECKING. If not ENABLE_GC_CHECKING,
clear TREE_CHAIN (obj) and TREE_TYPE (obj).
Marek Polacek [Wed, 13 Dec 2023 23:25:47 +0000 (18:25 -0500)]
c++: fix cpp0x/constexpr-ex1.C in C++23
Since r14-6505 I see:
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++23 at line 91 (test for errors, line 89)
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++23 (test for excess errors)
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++26 at line 91 (test for errors, line 89)
FAIL: g++.dg/cpp0x/constexpr-ex1.C -std=c++26 (test for excess errors)
and it wasn't fixed by r14-6511. So I'm fixing it with the below.
Richard Ball [Wed, 13 Dec 2023 21:34:57 +0000 (21:34 +0000)]
aarch64: SVE/NEON Bridging intrinsics
ACLE has added intrinsics to bridge between SVE and Neon.
The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.
This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq
gcc/ChangeLog:
* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h:
Add handle_arm_neon_sve_bridge_h
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_<mode>): Likewise.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.
These notes are controlled by a new command line flag
-fdiagnostics-all-candidates which also controls whether we note
ignored candidates more generally.
* call.cc (print_z_candidates): Only print ignored candidates
when -fdiagnostics-all-candidates is set, otherwise suggest
the flag.
(build_over_call): When diagnosing deletedness, note
other candidates only if -fdiagnostics-all-candidates is
set, otherwise suggest the flag.
gcc/testsuite/ChangeLog:
* g++.dg/overload/error6.C: Pass -fdiagnostics-all-candidates.
* g++.dg/cpp0x/deleted16.C: New test.
* g++.dg/cpp0x/deleted16a.C: New test.
* g++.dg/overload/error6a.C: New test.
Patrick Palka [Wed, 13 Dec 2023 21:46:01 +0000 (16:46 -0500)]
c++: remember candidates that we ignored
During overload resolution, we sometimes outright ignore a function in
the overload set and leave no trace of it in the candidates list, for
example when we find a perfect non-template candidate we discard all
function templates, or when the callee is a template-id we discard all
non-template functions. We should still however make note of these
non-viable functions when diagnosing overload resolution failure, but
that's not possible if they're not present in the returned candidates
list.
To that end, this patch reworks add_candidates to add such ignored
functions to the list. The new rr_ignored rejection reason is somewhat
of a catch-all; we could perhaps split it up into more specific rejection
reasons, but I leave that as future work.
gcc/cp/ChangeLog:
* call.cc (enum rejection_reason_code): Add rr_ignored.
(add_ignored_candidate): Define.
(ignored_candidate_p): Define.
(add_template_candidate_real): Do add_ignored_candidate
instead of returning NULL.
(splice_viable): Put ignored (non-viable) candidates last.
(print_z_candidate): Handle ignored candidates.
(build_new_function_call): Refine shortcut that calls
cp_build_function_call_vec now that non-templates can
appear in the candidate list for a template-id call.
(add_candidates): Replace 'bad_fns' overload with 'bad_cands'
candidate list. When not considering a candidate, add it
to the list as an ignored candidate. Add all 'bad_cands'
to the overload set as well.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/param-type-mismatch-2.C: Rename template
function test_7 that (maybe accidentally) shares the same name
as its non-template callee.
* g++.dg/overload/error6.C: New test.
Patrick Palka [Wed, 13 Dec 2023 21:45:50 +0000 (16:45 -0500)]
c++: sort candidates according to viability
This patch:
* changes splice_viable to move the non-viable candidates to the end
of the list instead of removing them outright
* makes tourney move the best candidate to the front of the candidate
list
* adjusts print_z_candidates to preserve our behavior of printing only
viable candidates when diagnosing ambiguity
* adds a parameter to print_z_candidates to control this default behavior
(the follow-up patch will want to print all candidates when diagnosing
deletedness)
Thus after this patch we have access to the entire candidate list through
the best viable candidate.
This change also happens to fix diagnostics for the below testcase where
we currently neglect to note the third candidate, since the presence of
the two unordered non-strictly viable candidates causes splice_viable to
prematurely get rid of the non-viable third candidate.
gcc/cp/ChangeLog:
* call.cc: Include "tristate.h".
(splice_viable): Sort the candidate list according to viability.
Don't remove non-viable candidates from the list.
(print_z_candidates): Add defaulted only_viable_p parameter.
By default only print non-viable candidates if there is no
viable candidate.
(tourney): Ignore non-viable candidates. Move the true champ to
the front of the candidates list, and update 'candidates' to
point to the front. Rename champ_compared_to_predecessor to
previous_worse_champ.
Patrick Palka [Wed, 13 Dec 2023 20:55:14 +0000 (15:55 -0500)]
c++: unifying constants vs their type [PR99186, PR104867]
When unifying constants we need to treat constants of different types
but same value as different in light of auto template parameters since
otherwise e.g. A<1> will unify with A<1u> (where A's template-head is
template<auto>). This patch fixes this in a minimal way; it seems we
could get away with just using template_args_equal here, as we do in the
default case, or even just cp_tree_equal since the CONVERT_EXPR_P loop
seems to be dead code, but that's a simplification we could consider
during next stage 1.
PR c++/99186
PR c++/104867
gcc/cp/ChangeLog:
* pt.cc (unify) <case INTEGER_CST>: Compare types as well.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/nontype-auto23.C: New test.
* g++.dg/cpp1z/nontype-auto24.C: New test.
Patrick Palka [Wed, 13 Dec 2023 20:55:01 +0000 (15:55 -0500)]
c++: unifying FUNCTION_DECLs [PR93740]
unify currently always returns success when unifying two FUNCTION_DECLs
(due to the is_overloaded_fn deferment within the default case), which
means for the below testcase we incorrectly unify &A::foo and &A::bar
leading to deduction failure for the index_of calls due to a bogus base
class ambiguity.
This patch makes unify handle FUNCTION_DECL naturally like other decls.
PR c++/93740
gcc/cp/ChangeLog:
* pt.cc (unify) <case FUNCTION_DECL>: Handle it like FIELD_DECL
and TEMPLATE_DECL.
Jason Merrill [Wed, 22 Nov 2023 18:20:58 +0000 (13:20 -0500)]
c-family: -Waddress-of-packed-member and casts
-Waddress-of-packed-member, in addition to the documented warning about
actually taking the address of a packed member, also warns about casting
from a pointer to a TYPE_PACKED type to a pointer to a type with greater
alignment.
This wrongly warns if the source is a pointer to enum when -fshort-enums
is on, since that is also represented by TYPE_PACKED.
And there's already -Wcast-align to catch casting from pointer to less
aligned type (packed or otherwise) to pointer to more aligned type; even
apart from the enum problem, this seems like a somewhat arbitrary subset of
that warning.
So, this patch removes the undocumented type-based warning from
-Waddress-of-packed-member. Some of the tests where the warning is
desirable I changed to use -Wcast-align=strict instead. The ones that
require -Wno-incompatible-pointer-types I just removed.
gcc/c-family/ChangeLog:
* c-warn.cc (check_address_or_pointer_of_packed_member):
Remove warning based on TYPE_PACKED.
gcc/testsuite/ChangeLog:
* c-c++-common/Waddress-of-packed-member-1.c: Don't expect
a warning on the cast cases.
* c-c++-common/pr51628-35.c: Use -Wcast-align=strict.
* g++.dg/warn/Waddress-of-packed-member3.C: Likewise.
* gcc.dg/pr88928.c: Likewise.
* gcc.dg/pr51628-20.c: Removed.
* gcc.dg/pr51628-21.c: Removed.
* gcc.dg/pr51628-25.c: Removed.
Julian Brown [Mon, 17 Oct 2022 16:44:31 +0000 (16:44 +0000)]
OpenMP: Pointers and member mappings
This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".
We change the mapping nodes used for a derived-type mapping like this:
type T
integer, pointer, dimension(:) :: arrptr
end type T
type(T) :: tvar
[...]
!$omp target map(tofrom: tvar%arrptr)
So that the nodes used look like this:
1) map(to: tvar%arrptr) -->
GOMP_MAP_TO [implicit] *tvar%arrptr%data (the array data)
GOMP_MAP_TO_PSET tvar%arrptr (the descriptor)
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data
In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer
-- so we drop it entirely. (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)
In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:
GOMP_MAP_STRUCT tvar (len: 1)
GOMP_MAP_TO_PSET tvar%arr (size: 64, etc.) <--. moved here
[...] |
GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___|
GOMP_MAP_ATTACH_DETACH tvar%arrptr%data
In another case, if we have an array of derived-type values "dtarr",
and mappings like:
i = 1
j = 1
map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))
We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):
Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.
The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement. Also now if you have
mappings like this:
#pragma omp target enter data map(to: dv, dv%arr(1:20))
The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:
To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion. A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.
This version of the patch fixes "omp target exit data" handling
for GOMP_MAP_DELETE, and adds pretty-printing dump output
for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra
clarity).
Also I noticed the handling of descriptors on *OpenACC*
exit-data directives was inconsistent, so I've made those use
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as
OpenMP too. In the end it doesn't actually matter to the runtime,
which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for
array descriptors on OpenACC "exit data" directives the same, anyway,
and doing it this way in the FE avoids needless divergence.
I've added a couple of new tests (gomp/target-enter-exit-data.f90 and
goacc/enter-exit-data-2.f90).
2023-12-07 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* dependency.cc (gfc_omp_expr_prefix_same): New function.
* dependency.h (gfc_omp_expr_prefix_same): Add prototype.
* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
union.
* trans-openmp.cc (dependency.h): Include.
(gfc_trans_omp_array_section): Adjust mapping node arrangement for
array descriptors. Use GOMP_MAP_TO_PSET or
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR
flag set.
(gfc_symbol_rooted_namelist): New function.
(gfc_trans_omp_clauses): Check subcomponent and subarray/element
accesses elsewhere in the clause list for pointers to derived types or
array descriptors, and adjust or drop mapping nodes appropriately.
Adjust for changes to mapping node arrangement.
(gfc_trans_oacc_executable_directive): Pass code op through.
gcc/
* gimplify.cc (omp_map_clause_descriptor_p): New function.
(build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use
above function.
(omp_tsort_mapping_groups): Process nodes that have
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't. Add
enter_exit_data parameter.
(omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if
we're mapping the whole containing derived-type variable.
(omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling.
Remove GOMP_MAP_ALWAYS_POINTER handling.
(gimplify_scan_omp_clauses): Pass enter_exit argument to
omp_tsort_mapping_groups. Don't adjust/remove GOMP_MAP_TO_PSET
mappings for derived-type components here.
* tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro.
* tree-pretty-print.cc (dump_omp_clause): Show
OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with
GOMP_MAP_TO_PSET-like syntax).
gcc/testsuite/
* gfortran.dg/goacc/enter-exit-data-2.f90: New test.
* gfortran.dg/goacc/finalize-1.f: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
* gfortran.dg/gomp/map-subarray-2.f90: New test.
* gfortran.dg/gomp/map-subarray.f90: New test.
* gfortran.dg/gomp/target-enter-exit-data.f90: New test.
libgomp/
* testsuite/libgomp.fortran/map-subarray.f90: New test.
* testsuite/libgomp.fortran/map-subarray-2.f90: New test.
* testsuite/libgomp.fortran/map-subarray-3.f90: New test.
* testsuite/libgomp.fortran/map-subarray-4.f90: New test.
* testsuite/libgomp.fortran/map-subarray-6.f90: New test.
* testsuite/libgomp.fortran/map-subarray-7.f90: New test.
* testsuite/libgomp.fortran/map-subarray-8.f90: New test.
* testsuite/libgomp.fortran/map-subcomponents.f90: New test.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
descriptor-mapping changes. Remove XFAIL.
Julian Brown [Mon, 14 Aug 2023 12:41:56 +0000 (12:41 +0000)]
OpenMP/OpenACC: Rework clause expansion and nested struct handling
This patch reworks clause expansion in the C, C++ and (to a lesser
extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in
GPU offloading support.
At present a single clause may be turned into several mapping nodes,
or have its mapping type changed, in several places scattered through
the front- and middle-end. The analysis relating to which particular
transformations are needed for some given expression has become quite hard
to follow. Briefly, we manipulate clause types in the following places:
1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of
rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into
ATTACH_DETACH, or mark the decl addressable.
2. In semantics.cc or c-typeck.cc, clauses are expanded in
handle_omp_array_sections (called via {c_}finish_omp_clauses, or in
finish_omp_clauses itself. The two cases are for processing array
sections (the former), or non-array sections (the latter).
3. In gimplify.cc, we build sibling lists for struct accesses, which
groups and sorts accesses along with their struct base, creating
new ALLOC/RELEASE nodes for pointers.
4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be
adjusted or created.
This patch doesn't completely disrupt this scheme, though clause
types are no longer adjusted in c_omp_adjust_map_clauses (step 1).
Clause expansion in step 2 (for C and C++) now uses a single, unified
mechanism, parts of which are also reused for analysis in step 3.
Rather than the kind-of "ad-hoc" pattern matching on addresses used to
expand clauses used at present, a new method for analysing addresses is
introduced. This does a recursive-descent tree walk on expression nodes,
and emits a vector of tokens describing each "part" of the address.
This tokenized address can then be translated directly into mapping nodes,
with the assurance that no part of the expression has been inadvertently
skipped or misinterpreted. In this way, all the variations of ways
pointers, arrays, references and component accesses might be combined
can be teased apart into easily-understood cases - and we know we've
"parsed" the whole address before we start analysis, so the right code
paths can easily be selected.
For example, a simple access "arr[idx]" might parse as:
base-decl access-indexed-array
or "mystruct->foo[x]" with a pointer "foo" component might parse as:
A key observation is that support for "array" bases, e.g. accesses
whose root nodes are not structures, but describe scalars or arrays,
and also *one-level deep* structure accesses, have first-class support
in gimplify and beyond. Expressions that use deeper struct accesses
or e.g. multiple indirections were more problematic: some cases worked,
but lots of cases didn't. This patch reimplements the support for those
in gimplify.cc, again using the new "address tokenization" support.
An expression like "mystruct->foo->bar[0:10]" used in a mapping node will
translate the right-hand access directly in the front-end. The base for
the access will be "mystruct->foo". This is handled recursively in
gimplify.cc -- there may be several accesses of "mystruct"'s members
on the same directive, so the sibling-list building machinery can be
used again. (This was already being done for OpenACC, but the new
implementation differs somewhat in details, and is more robust.)
For OpenMP, in the case where the base pointer itself,
i.e. "mystruct->foo" here, is NOT mapped on the same directive, we
create a "fragile" mapping. This turns the "foo" component access
into a zero-length allocation (which is a new feature for the runtime,
so support has been added there too).
A couple of changes have been made to how mapping clauses are turned
into mapping nodes:
The first change is based on the observation that it is probably never
correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for
references), because if the containing struct is already mapped on the
target then the host version of the pointer in question will be corrupted
if the struct is copied back from the target. This patch removes all
such uses, across each of C, C++ and Fortran.
The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes
are processed during sibling-list creation. For OpenMP, for pointer
components, we must map the base pointer separately from an array section
that uses the base pointer, so e.g. we must have both "map(mystruct.base)"
and "map(mystruct.base[0:10])" mappings. These create nodes such as:
we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that
drops the GOMP_MAP_TOFROM for the base pointer, marks the second group
as having had a base-pointer mapping, then omp_build_struct_sibling_lists
can create:
This ends up working better in many cases, particularly those involving
references. (The "alloc" space is immediately overwritten by a pointer
attachment, so this is mildly more efficient than a redundant TO mapping
at runtime also.)
There is support in the address tokenizer for "arbitrary" base expressions
which aren't rooted at a decl, but that is not used as present because
such addresses are disallowed at parse time.
In the front-ends, the address tokenization machinery is mostly only
used for clause expansion and not for diagnostics at present. It could
be used for those too, which would allow more of my previous "address
inspector" implementation to be removed.
The new bits in gimplify.cc work with OpenACC also.
This version of the patch addresses several first-pass review comments
from Tobias, and fixes a few previously-missed cases for manually-managed
ragged array mappings (including cases using references). Some arbitrary
differences between handling of clause expansion for C vs. C++ have also
been fixed, and some fragments from later in the patch series have been
moved forward (where they were useful for fixing bugs). Several new
test cases have been added.
2023-11-29 Julian Brown <julian@codesourcery.com>
gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET.
(omp_addr_token): Add forward declaration.
(c_omp_address_inspector): New class.
* c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but
do not change any mapping node types.
(c_omp_address_inspector::unconverted_ref_origin,
c_omp_address_inspector::component_access_p,
c_omp_address_inspector::check_clause,
c_omp_address_inspector::get_root_term,
c_omp_address_inspector::map_supported_p,
c_omp_address_inspector::get_origin,
c_omp_address_inspector::maybe_unconvert_ref,
c_omp_address_inspector::maybe_zero_length_array_section,
c_omp_address_inspector::expand_array_base,
c_omp_address_inspector::expand_component_selector,
c_omp_address_inspector::expand_map_clause): New methods.
(omp_expand_access_chain): New function.
gcc/c/
* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for c_finish_omp_clauses call.
(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
(c_parser_oacc_compute): Likewise.
(c_parser_omp_target_data, c_parser_omp_target_enter_data): Support
ATTACH kind.
(c_parser_omp_target_exit_data): Support DETACH kind.
(check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here.
* c-typeck.cc (handle_omp_array_sections_1,
handle_omp_array_sections, c_finish_omp_clauses): Use
c_omp_address_inspector class and OMP address tokenizer to analyze and
expand map clause expressions. Fix some diagnostics. Fix "is OpenACC"
condition for C_ORT_ACC_TARGET addition.
gcc/cp/
* parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for finish_omp_clauses call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support
GOMP_MAP_ATTACH kind.
(cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind.
(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
(cp_parser_oacc_compute): Likewise.
* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
tsubst_omp_clauses for OpenACC compute regions.
* semantics.cc (cp_omp_address_inspector): New class, derived from
c_omp_address_inspector.
(handle_omp_array_sections_1, handle_omp_array_sections,
finish_omp_clauses): Use cp_omp_address_inspector class and OMP address
tokenizer to analyze and expand OpenMP map clause expressions. Fix
some diagnostics. Support C_ORT_ACC_TARGET.
(finish_omp_target): Handle GOMP_MAP_POINTER.
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter.
Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for
derived type components.
(gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section.
gcc/
* gimplify.cc (build_struct_comp_nodes): Don't process
GOMP_MAP_ATTACH_DETACH "middle" nodes here.
(omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for
nested struct handling.
(omp_strip_components_and_deref, omp_strip_indirections): Remove
functions.
(omp_get_attachment): Handle GOMP_MAP_DETACH here.
(omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH,
GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer
component array sections.
(omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile
fields.
(omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT.
(omp_index_mapping_groups_1): Skip reprocess_struct groups.
(omp_get_nonfirstprivate_group, omp_directive_maps_explicitly,
omp_resolve_clause_dependencies, omp_first_chained_access_token): New
functions.
(omp_check_mapping_compatibility): Adjust accepted node combinations
for "from" clauses using release instead of alloc.
(omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P,
REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer
to analyze addresses. Reimplement nested struct handling, and
implement "fragile groups".
(omp_build_struct_sibling_lists): Adjust for changes to
omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes
after GOMP_MAP_STRUCT nodes.
(gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use
OMP address tokenizer.
(gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc
instead of build_simple_mem_ref_loc.
* omp-general.cc (omp-general.h, tree-pretty-print.h): Include.
(omp_addr_tokenizer): New namespace.
(omp_addr_tokenizer::omp_addr_token): New.
(omp_addr_tokenizer::omp_parse_component_selector,
omp_addr_tokenizer::omp_parse_ref,
omp_addr_tokenizer::omp_parse_pointer,
omp_addr_tokenizer::omp_parse_access_method,
omp_addr_tokenizer::omp_parse_access_methods,
omp_addr_tokenizer::omp_parse_structure_base,
omp_addr_tokenizer::omp_parse_structured_expr,
omp_addr_tokenizer::omp_parse_array_expr,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New functions.
(omp_parse_expr, debug_omp_tokenized_addr): New functions.
* omp-general.h (omp_addr_tokenizer::access_method_kinds,
omp_addr_tokenizer::structure_base_kinds,
omp_addr_tokenizer::token_type,
omp_addr_tokenizer::omp_addr_token,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New.
(omp_addr_token, omp_parse_expr): New.
* omp-low.cc (scan_sharing_clauses): Skip error check for references
to pointers.
* tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro.
Julian Brown [Tue, 6 Sep 2022 10:26:05 +0000 (10:26 +0000)]
OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause
This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).
2022-09-13 Julian Brown <julian@codesourcery.com>
gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.
Jakub Jelinek [Wed, 13 Dec 2023 20:13:22 +0000 (21:13 +0100)]
libcpp: Fix valgrind errors on pr88974.c [PR112956]
On the c-c++-common/cpp/pr88974.c testcase I'm seeing
==600549== Conditional jump or move depends on uninitialised value(s)
==600549== at 0x1DD3A05: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:3050)
==600549== by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392)
==600549== by 0x1DB9471: do_if(cpp_reader*) (directives.cc:2087)
==600549== by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572)
==600549== by 0x1DCD488: _cpp_lex_token (lex.cc:3682)
==600549== by 0x1DD3A97: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:2936)
==600549== by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350)
==600549== by 0x7F7EE4: preprocess_file(cpp_reader*) (c-ppoutput.cc:106)
==600549== by 0x7F6235: c_common_init() (c-opts.cc:1280)
==600549== by 0x704C8B: lang_dependent_init (toplev.cc:1837)
==600549== by 0x704C8B: do_compile (toplev.cc:2135)
==600549== by 0x704C8B: toplev::main(int, char**) (toplev.cc:2306)
==600549== by 0x7064BA: main (main.cc:39)
error. The problem is that _cpp_lex_direct can leave result->src_loc
uninitialized in some cases and later on we use that location_t.
fresh_line:
result->flags = 0;
...
if (buffer->need_line)
{
if (pfile->state.in_deferred_pragma)
{
result->type = CPP_PRAGMA_EOL;
... // keeps result->src_loc uninitialized;
return result;
}
if (!_cpp_get_fresh_line (pfile))
{
result->type = CPP_EOF;
if (!pfile->state.in_directive && !pfile->state.parsing_args)
{
result->src_loc = pfile->line_table->highest_line;
...
}
... // otherwise result->src_loc is sometimes uninitialized here
return result;
}
...
}
...
result->src_loc = pfile->line_table->highest_line;
...
c = *buffer->cur++;
switch (c)
{
...
case '\n':
...
buffer->need_line = true;
if (pfile->state.in_deferred_pragma)
{
result->type = CPP_PRAGMA_EOL;
...
return result;
}
goto fresh_line;
...
}
...
So, if _cpp_lex_direct is called without buffer->need_line initially set,
result->src_loc is always initialized (and actually hundreds of tests rely
on that exact value it has), even when c == '\n' and we set that flag later
on and goto fresh_line. For CPP_PRAGMA_EOL case we have in that case
separate handling and don't goto.
But if _cpp_lex_direct is called with buffer->need_line initially set and
either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails
for some reason and we return an CPP_ERROR token and we are in directive
or parsing args state, it is kept uninitialized and can be whatever the
allocation left it there as.
The following patch attempts to keep the status quo, use value that was
returned previously if it was initialized (i.e. we went through the
goto fresh_line; statement in c == '\n' handling) and only initialize
result->src_loc if it was uninitialized before.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/112956
* lex.cc (_cpp_lex_direct): Initialize c to 0.
For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF
set result->src_loc to highest locus.
Thomas Schwinge [Wed, 13 Dec 2023 16:48:11 +0000 (17:48 +0100)]
Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch
Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:
In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
[...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
[...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
70 | gomp_debug (0, "libgomp: failed to pin %ld bytes of"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71 | " memory (ulimit too low?)\n", size);
| ~~~~
| |
| size_t {aka unsigned int}
[...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
186 | (gomp_debug) ((KIND), __VA_ARGS__); \
| ^~~~~~~~~~~
[...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
70 | gomp_debug (0, "libgomp: failed to pin %ld bytes of"
| ~~^
| |
| long int
| %d
cc1: all warnings being treated as errors
make[9]: *** [allocator.lo] Error 1
make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
[...]
Fix this in the same way as used elsewhere in libgomp.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
vs. '%ld' format string mismatch.
Jason Merrill [Wed, 13 Dec 2023 19:15:44 +0000 (14:15 -0500)]
c++: TARGET_EXPR location in default arg [PR96997]
My r14-6505-g52b4b7d7f5c7c0 change to copy the location in
build_aggr_init_expr reopened PR96997; let's fix it properly this time, by
clearing the location like we do for other trees.
PR c++/96997
gcc/cp/ChangeLog:
* tree.cc (bot_manip): Check data.clear_location for TARGET_EXPR.
For completeness here are three SHORTREAL modules which match their
LONGREAL and REAL counterparts. The datatype SHORTREAL is a GNU
extension and these modules were missing.
gcc/m2/ChangeLog:
PR modula2/112921
* gm2-libs-iso/ConvStringShort.def: New file.
* gm2-libs-iso/ConvStringShort.mod: New file.
* gm2-libs-iso/ShortConv.def: New file.
* gm2-libs-iso/ShortConv.mod: New file.
* gm2-libs-iso/ShortMath.def: New file.
* gm2-libs-iso/ShortMath.mod: New file.
* gm2-libs-iso/ShortStr.def: New file.
* gm2-libs-iso/ShortStr.mod: New file.
Nathaniel Shead [Fri, 3 Nov 2023 01:18:29 +0000 (12:18 +1100)]
c++: End lifetime of objects in constexpr after destructor call [PR71093]
This patch adds checks for using objects after they've been manually
destroyed via explicit destructor call. Currently this is only
implemented for 'top-level' objects; FIELD_DECLs and individual elements
of arrays will need a lot more work to track correctly and are left for
a future patch.
The other limitation is that destruction of parameter objects is checked
too 'early', happening at the end of the function call rather than the
end of the owning full-expression as they should be for consistency;
see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
good way to link the constructed parameter declarations with the
variable declarations that are actually destroyed later on to propagate
their lifetime status, so I'm leaving this for a later patch.
PR c++/71093
gcc/cp/ChangeLog:
* constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
return NULL_TREE for objects we're initializing.
(constexpr_global_ctx::destroy_value): Rename from remove_value.
Only mark real variables as outside lifetime.
(constexpr_global_ctx::clear_value): New function.
(destroy_value_checked): New function.
(cxx_eval_call_expression): Defer complaining about non-constant
arg0 for operator delete. Use remove_value_safe.
(cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
(outside_lifetime_error): Include name of object we're
accessing.
(cxx_eval_store_expression): Handle clobbers. Improve error
messages.
(cxx_eval_constant_expression): Use remove_value_safe. Clear
bind variables before entering body.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp2a/bitfield2.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
* g++.dg/cpp1y/constexpr-lifetime7.C: New test.
* g++.dg/cpp2a/constexpr-lifetime1.C: New test.
* g++.dg/cpp2a/constexpr-lifetime2.C: New test.
Jason Merrill [Wed, 13 Dec 2023 00:20:27 +0000 (19:20 -0500)]
c++: fix in-charge parm in constexpr
I was puzzled by the proposed patch for PR71093 specifically ignoring the
in-charge parameter; the problem turned out to be that when
cxx_eval_call_expression jumps from the clone to the cloned function, it
assumes that the latter has the same parameters, and so the in-charge parm
doesn't get an argument. Since a class with vbases can't have constexpr
'tors there isn't actually a need for an in-charge parameter in a
destructor, but we used to use it for deleting destructors and never removed
it. I have a patch to do that for GCC 15, but for now let's work around it.
Jason Merrill [Wed, 13 Dec 2023 03:53:10 +0000 (22:53 -0500)]
c++: constant direct-initialization [PR108243]
When testing the proposed patch for PR71093 I noticed that it changed the
diagnostic for consteval-prop6.C. I then noticed that the diagnostic wasn't
very helpful either way; it was complaining about modification of the 'x'
variable, but it's not a problem to initialize a local variable with a
consteval constructor as long as the value is actually constant, we want to
know why the value isn't constant. And then it turned out that this also
fixed a missed-optimization bug in the testsuite.
PR c++/108243
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_outermost_constant_expr): Turn
a constructor CALL_EXPR into a TARGET_EXPR.
Andrew Stubbs [Wed, 13 Dec 2023 12:00:52 +0000 (12:00 +0000)]
amdgcn: Work around XNACK register allocation problem
The extra register pressure is causing infinite loops in some cases, especially
at -O0. I have not yet observed any issue on devices that have AVGPRs for
spilling, and XNACK is only really useful on those devices anyway, so change
the defaults.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (NO_XNACK): Change the defaults.
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add HSACO_ATTR_DEFAULT.
* config/gcn/gcn.cc (gcn_option_override): Set the default flag_xnack.
* config/gcn/gcn.opt: Add -mxnack=default.
* doc/invoke.texi: Document the -mxnack default.
Andrew Stubbs [Mon, 31 Jul 2023 16:52:06 +0000 (17:52 +0100)]
amdgcn: Support XNACK mode
The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.
To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber. When the port supports scheduling
of s_waitcnt instructions there will be further requirements.
gcc/ChangeLog:
* config/gcn/gcn-hsa.h (NO_XNACK): Ignore missing -march.
(XNACKOPT): Match on/off; ignore any.
* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
Add xnack compatible alternatives.
(gather<mode>_insn_2offsets<exec>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Permit -mxnack for devices
other than Fiji and gfx1030.
(gcn_expand_epilogue): Remove early-clobber problems.
(gcn_hsa_declare_function_name): Obey -mxnack setting.
* config/gcn/gcn.md (xnack): New attribute.
(enabled): Rework to include "xnack" attribute.
(*movbi): Add xnack compatible alternatives.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*movti_insn): Likewise.
* config/gcn/gcn.opt (-mxnack): Change the default to "any".
* doc/invoke.texi: Remove placeholder notice for -mxnack.
Andrew Carlotti [Mon, 27 Nov 2023 16:12:01 +0000 (16:12 +0000)]
aarch64 testsuite: Check entire .arch string
Add a terminating newline to various tests, and add missing
extensions to some test strings. The current output is broken for
options_set_4.c, so this test is left unchanged, to be fixed in a
subsequent patch.
Andrew Stubbs [Tue, 4 Jan 2022 12:22:01 +0000 (12:22 +0000)]
libgomp: basic pinned memory on Linux
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Pan Li [Wed, 13 Dec 2023 13:46:14 +0000 (21:46 +0800)]
RISC-V: Refine test cases for both PR112929 and PR112988
Refine the test cases for:
* Name convention.
* Add run case.
These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior. Depending on the
execution environment they can either pass or fail. As of now,
with the latest QEMU version, they will pass even without the
underlying issue fixed. As the test case is known to have
caused the problem before we keep it as a run test case for
future reference.
PR target/112929
PR target/112988
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test.
which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"]. Using this, the sign
extensions above on ARC's EM both become:
bmsk_s r0,r0,4
xor r0,r0,16
sub r0,r0,16
which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.
2023-12-13 Roger Sayle <roger@nextmovesoftware.com>
Jeff Law <jlaw@ventanamicro.com>
gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.
gcc/testsuite/ChangeLog
* gcc.target/arc/extvsi-1.c: New test case.
* gcc.target/arc/extvsi-2.c: Likewise.
This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by
Eric Gallager <egallager@gcc.gnu.org>.
ChangeLog:
* Makefile.def (gettext): Disable (via missing)
{install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install
to true. Add --disable-threads --disable-libasprintf. Drop the
lib_path (as there are no shared libs).
* Makefile.in: Regenerate.
Juzhe-Zhong [Wed, 13 Dec 2023 05:48:11 +0000 (13:48 +0800)]
RISC-V: Postpone full available optimization [VSETVL PASS]
Fix VSETVL BUG that AVL is polluted
.L15:
li a3,9
lui a4,%hi(s)
sw a3,%lo(j)(t2)
sh a5,%lo(s)(a4) <--a4 is hold the address of s
beq t0,zero,.L42
sw t5,8(t4)
vsetvli zero,a4,e8,m8,ta,ma <<--- a4 as avl
Actually, this vsetvl is redundant.
The root cause we include full available optimization in LCM local data computation.
full available optimization should be after LCM computation.
PR target/112929
PR target/112988
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::compute_lcm_local_properties): Remove full available.
(pre_vsetvl::pre_global_vsetvl_info): Add full available optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr112929.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.
For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits.
only half of the vector units are working and another half is idle.
After investigation, I realize that I forgot to adjust COST for SELECT_VL.
So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since
after this patch:
foo:
ble a2,zero,.L5
.L3:
vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost.
vle8.v v2,0(a0)
slli a4,a5,1 -----> additional shift of outcome SELECT_VL for memory address calculation.
vzext.vf2 v1,v2
sub a2,a2,a5
vse16.v v1,0(a1)
add a0,a0,a5
add a1,a1,a4
bne a2,zero,.L3
.L5:
ret
This patch is a simple fix that I previous forgot.
Ok for trunk ?
If not, I am going to adjust cost in backend cost model.
PR target/111317
gcc/ChangeLog:
* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.
Jakub Jelinek [Wed, 13 Dec 2023 10:36:27 +0000 (11:36 +0100)]
lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some wider cast [PR112940]
The following testcase ICEs, because a PHI argument from latch edge
uses a SSA_NAME set only in a conditionally executed block inside of the
loop.
This happens when we have some outer cast which lowers its operand several
times, under some condition with variable index, under different condition
with some constant index, otherwise something else, and then there is
an inner cast from non-_BitInt integer (or small/middle one). Such cast
in certain conditions is emitted by initializing some SSA_NAMEs in the
initialization statements before loops (say for casts from <= limb size
precision by computing a SSA_NAME for the first limb and then extension
of it for the later limbs) and uses the prepare_data_in_out function
to create a PHI node. Such function is passed the value (constant or
SSA_NAME) to use in the PHI argument from the pre-header edge, but for
the latch edge it always created a new SSA_NAME and then caller emitted
in the following 3 spots an extra assignment to set that SSA_NAME to
whatever value we want from the latch edge. In all these 3 cases
the argument from the latch edge is known already before the loop though,
either constant or SSA_NAME computed in pre-header as well.
But the need to emit an assignment combined with the handle_operand done
in a conditional basic block results in the SSA verification failure.
The following patch fixes it by extending the prpare_data_in_out method,
so that when the latch edge argument is known before (constant or computed
in pre-header), we can just use it directly and avoid the extra assignment
that would normally be hopefully optimized away later to what we now emit
directly.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112940
* gimple-lower-bitint.cc (struct bitint_large_huge): Add another
argument to prepare_data_in_out method defaulted to NULL_TREE.
(bitint_large_huge::handle_operand): Pass another argument to
prepare_data_in_out instead of emitting an assignment to set it.
(bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
If non-NULL, use it as PHI argument instead of creating a new
SSA_NAME.
(bitint_large_huge::handle_cast): Pass rext as another argument
to 2 prepare_data_in_out calls instead of emitting assignments
to set them.
Jakub Jelinek [Wed, 13 Dec 2023 10:35:20 +0000 (11:35 +0100)]
attribs: Fix valgrind failures on -Wno-attributes* tests [PR112953]
The r14-6076 change changed the allocation of attribute tables from
table = new attribute_spec[2];
to
table = new attribute_spec { ... };
with
ignored_attributes_table.safe_push (table);
later in both cases, but didn't change the corresponding delete in
free_attr_data, which means valgrind is unhappy about that:
FAIL: c-c++-common/Wno-attributes-2.c -Wc++-compat (test for excess errors)
Excess errors:
==974681== Mismatched free() / delete / delete []
==974681== at 0x484965B: operator delete[](void*) (vg_replace_malloc.c:1103)
==974681== by 0x707434: free_attr_data() (attribs.cc:318)
==974681== by 0xCFF8A4: compile_file() (toplev.cc:454)
==974681== by 0x704D23: do_compile (toplev.cc:2150)
==974681== by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681== by 0x7064BA: main (main.cc:39)
==974681== Address 0x51dffa0 is 0 bytes inside a block of size 40 alloc'd
==974681== at 0x4845FF5: operator new(unsigned long) (vg_replace_malloc.c:422)
==974681== by 0x70A040: handle_ignored_attributes_option(vec<char*, va_heap, vl_ptr>*) (attribs.cc:301)
==974681== by 0x7FA089: handle_pragma_diagnostic_impl<false, false> (c-pragma.cc:934)
==974681== by 0x7FA089: handle_pragma_diagnostic(cpp_reader*) (c-pragma.cc:1028)
==974681== by 0x75814F: c_parser_pragma(c_parser*, pragma_context, bool*) (c-parser.cc:14707)
==974681== by 0x784A85: c_parser_external_declaration(c_parser*) (c-parser.cc:2027)
==974681== by 0x785223: c_parser_translation_unit (c-parser.cc:1900)
==974681== by 0x785223: c_parse_file() (c-parser.cc:26713)
==974681== by 0x7F6331: c_common_parse_file() (c-opts.cc:1301)
==974681== by 0xCFF87D: compile_file() (toplev.cc:446)
==974681== by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681== by 0x7064BA: main (main.cc:39)
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112953
* attribs.cc (free_attr_data): Use delete x rather than delete[] x.
Jakub Jelinek [Wed, 13 Dec 2023 10:34:12 +0000 (11:34 +0100)]
i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962]
The following patch fixes ICE on the testcase in similar way to how
other folded builtins are handled in ix86_gimple_fold_builtin when
they don't have a lhs; these builtins are const or pure, so normally
DCE would remove them later, but with -O0 that isn't guaranteed to
happen, and during expansion if they are marked TREE_SIDE_EFFECTS
it might still be attempted to be expanded.
This removes them right away during the folding.
Initially I wanted to also change all gsi_replace last args in that function
to true, but Andrew pointed to PR107209, so I've kept them as is.
2023-12-13 Jakub Jelinek <jakub@redhat.com>
PR target/112962
* config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts
and abs without lhs replace with nop.
Richard Biener [Wed, 13 Dec 2023 08:05:59 +0000 (09:05 +0100)]
Avoid losing MEM_REF offset in MEM_EXPR adjustment for stack slot sharing
When investigating PR111591 with respect to TBAA and stack slot sharing
I noticed we're eventually scrapping a [TARGET_]MEM_REF offset when
rewriting the VAR_DECL base of the MEM_EXPR to use a pointer to the
partition instead. The following makes sure to preserve that.
* emit-rtl.cc (set_mem_attributes_minus_bitpos): Preserve
the offset when rewriting an exising MEM_REF base for
stack slot sharing.
Richard Biener [Wed, 13 Dec 2023 07:45:58 +0000 (08:45 +0100)]
tree-optimization/112991 - re-do PR112961 fix
The following does away with the fake edge adding as in the original
PR112961 fix and instead exposes handling of entry PHIs as additional
parameter of the region VN run.
Richard Biener [Wed, 13 Dec 2023 08:38:59 +0000 (09:38 +0100)]
tree-optimization/112990 - unsupported VEC_PERM from match pattern
The following avoids creating an unsupported VEC_PERM after vector
lowering from the pattern merging a bit-insert from a bit-field-ref
to a VEC_PERM. For the already existing s390 testcase we get
TImode vectors which later ICE during attempted expansion of
a vec_perm_const.
PR tree-optimization/112990
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..):
Restrict to vector modes after lowering.
Richard Biener [Wed, 13 Dec 2023 07:54:49 +0000 (08:54 +0100)]
middle-end/111591 - explain why TBAA doesn't need adjustment
While tidying the prototype patch I've done for the reduced testcase
in PR111591 and in that process trying to produce a testcase that
is miscompiled by stack slot coalescing and the TBAA info that
remains un-altered I've realized we do not need to adjust TBAA info.
The following documents this in the place we adjust points-to info
which we do need to adjust.
PR middle-end/111591
* cfgexpand.cc (update_alias_info_with_stack_vars): Document
why not adjusting TBAA info on accesses is OK.
Alexandre Oliva [Wed, 13 Dec 2023 04:31:41 +0000 (01:31 -0300)]
multiflags: fix doc warning properly
Rather than a dubious fix for a dubious warning, namely adding a
period after a parenthesized @xref because the warning demands it, use
@pxref that is meant for exactly this case. Thanks to Joseph Myers
for introducing me to it.
for gcc/ChangeLog
* doc/invoke.texi (multiflags): Drop extraneous period, use
@pxref instead.
aarch64: Implement the ACLE instruction/data prefetch functions.
Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:
1. Data prefetch intrinsics:
----------------------------
void __pldx (/*constant*/ unsigned int /*access_kind*/,
/*constant*/ unsigned int /*cache_level*/,
/*constant*/ unsigned int /*retention_policy*/,
void const volatile *addr);
void __pld (void const volatile *addr);
2. Instruction prefetch intrinsics:
-----------------------------------
void __plix (/*constant*/ unsigned int /*cache_level*/,
/*constant*/ unsigned int /*retention_policy*/,
void const volatile *addr);
void __pli (void const volatile *addr);
`__pldx' affords the programmer more fine-grained control over the
data prefetch behaviour than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.
While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.
`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.
`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.
Bootstrapped and tested on aarch64-none-linux-gnu.
Kewen Lin [Wed, 13 Dec 2023 02:39:34 +0000 (20:39 -0600)]
range: Workaround different type precision between _Float128 and long double [PR112788]
As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128
has the different type precision (128) from that (127) of type long
double, but actually they has the same underlying mode, so they have
the same precision as the mode indicates the same real type format
ieee_quad_format.
It's not sensible to have such two types which have the same mode but
different type precisions, some fix attempt was posted at [1].
As the discussion there, there are some historical reasons and
practical issues. Considering we passed stage 1 and it also affected
the build as reported, this patch is trying to temporarily workaround
it. I thought to introduce a hookpod but that seems a bit overkill,
assuming scalar float type with the same mode should have the same
precision looks sensible.
* value-range.h (range_compatible_p): Workaround same type mode but
different type precision issue for rs6000 scalar float types
_Float128 and long double.
Jiufu Guo [Wed, 13 Dec 2023 00:10:25 +0000 (08:10 +0800)]
rs6000: using pli for constant splitting
For constant building e.g. r120=0x66666666, which does not fit 'li or lis',
'pli' is used to build this constant via 'emit_move_insn'.
While for a complicated constant, e.g. 0x6666666666666666ULL, when using
'rs6000_emit_set_long_const' to split the constant recursively, it fails to
use 'pli' to build the half part constant: 0x66666666.
'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
part of the constant when necessary. For example: 0x6666666666666666ULL,
"pli 3,1717986918; rldimi 3,3,32,0" can be used.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.
Jiufu Guo [Wed, 13 Dec 2023 00:10:25 +0000 (08:10 +0800)]
rs6000: accurate num_insns_constant_gpr
Trunk gcc supports more constants to be built via two instructions:
e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.
Function "rs6000_emit_set_long_const" is used to build complicated
constants; and "num_insns_constant_gpr" is used to compute 'how
many instructions are needed" to build the constant. So, these
two functions should be aligned.
The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
compute/record the instruction number(when computing the insn_num,
then do not emit instructions).
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
parameter to record number of instructions to build the constant.
(num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
num_insn.
Juzhe-Zhong [Tue, 12 Dec 2023 14:25:52 +0000 (22:25 +0800)]
RISC-V: Apply vla vs. vls mode heuristic vector COST model
This patch apply vla vs. vls mode heuristic which can fixes the following FAILs:
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-not vset
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2
The root cause of this FAIL is we failed to pick VLS mode for the vectorization.
The heuristic leverage ARM SVE and fully tested and confirm we have same behavior
as ARM SVE GCC and RVV Clang.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (costs::analyze_loop_vinfo): New function.
(costs::record_potential_vls_unrolling): Ditto.
(costs::prefer_unrolled_loop): Ditto.
(costs::better_main_loop_than_p): Ditto.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv-vector-costs.h (enum cost_type_enum): New enum.
* config/riscv/t-riscv: Add new include files.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr111313.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: New test.
Jonathan Wakely [Tue, 12 Dec 2023 20:53:08 +0000 (20:53 +0000)]
libstdc++: Fix std::format("{}", 'c')
When I added a fast path for std::format("{}", x) in r14-5587-g41a5ea4cab2c59 I forgot to handle char separately from other
integral types. That caused std::format("{}", 'c') to return "99"
instead of "c".
libstdc++-v3/ChangeLog:
* include/std/format (__do_vformat_to): Handle char separately
from other integral types.
* testsuite/std/format/functions/format.cc: Check for expected
output for char and bool arguments.
* testsuite/std/format/string.cc: Check that 0 filling is
rejected for character and string formats.
Jonathan Wakely [Mon, 11 Dec 2023 15:33:59 +0000 (15:33 +0000)]
libstdc++: Fix std::format output of %C for negative years
During discussion of LWG 4022 I noticed that we do not correctly
implement floored division for the century. We were just truncating
towards zero, rather than applying the floor function. For negative
values that rounds the wrong way.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Fix
rounding for negative centuries.
* testsuite/std/time/year/io.cc: Check %C for negative years.
Jonathan Wakely [Tue, 12 Dec 2023 14:54:36 +0000 (14:54 +0000)]
libstdc++: Remove redundant -std flags from Makefile
In r14-4060-gc4baeaecbbf7d0 I moved some files from src/c++98 to
src/c++11 but I didn't remove the redundant -std=gnu++11 flags for those
files. The flags aren't needed now, because AM_CXXFLAGS for that
directory already uses -std=gnu++11. This removes them.
Martin Jambor [Tue, 12 Dec 2023 20:19:21 +0000 (21:19 +0100)]
SRA: Force gimple operand in an additional corner case (PR 112822)
PR 112822 revealed a corner case in load_assign_lhs_subreplacements
where it creates invalid gimple: an assignment where on the LHS there
is a complex variable which however is not a gimple register because
it has partial defs and on the right hand side there is a
VIEW_CONVERT_EXPR. This patch invokes force_gimple_operand_gsi on
such statements (like it already does when both sides of a generated
assignment have partial definitions.
gcc/ChangeLog:
2023-12-12 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/112822
* tree-sra.cc (load_assign_lhs_subreplacements): Invoke
force_gimple_operand_gsi also when LHS has partial stores and RHS is a
VIEW_CONVERT_EXPR.
Szabolcs Nagy [Thu, 15 Jun 2023 16:15:09 +0000 (17:15 +0100)]
aarch64,arm: Fix branch-protection= parsing
Refactor the parsing to have a single API and fix a few parsing issues:
- Different handling of "bti+none" and "none+bti": these should be
rejected because "none" can only appear alone.
- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
was caused by using strtok_r.
- Memory got leaked (str_root was never freed). And two buffers got
allocated when one is enough.
The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally. The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.
Richard Biener [Mon, 11 Dec 2023 13:39:48 +0000 (14:39 +0100)]
tree-optimization/112736 - avoid overread with non-grouped SLP load
The following aovids over/under-read of storage when vectorizing
a non-grouped load with SLP. Instead of forcing peeling for gaps
use a smaller load for the last vector which might access excess
elements. This builds upon the existing optimization avoiding
peeling for gaps, generalizing it to all gap widths leaving a
power-of-two remaining number of elements (but it doesn't replace
or improve that particular case at this point).
I wonder if the poly relational compares I set up are good enough
to guarantee /* remain should now be > 0 and < nunits. */.
There is existing test coverage that runs into /* DR will be unused. */
always when the gap is wider than nunits. Compared to the
existing gap == nunits/2 case this only adjusts the load that will
cause the overrun at the end, not every load. Apart from the
poly relational compares it should reliably cover these cases but
I'll leave it for stage1 to remove.
PR tree-optimization/112736
* tree-vect-stmts.cc (vectorizable_load): Extend optimization
to avoid peeling for gaps to handle single-element non-groups
we now allow with SLP.
Richard Biener [Mon, 11 Dec 2023 09:08:24 +0000 (10:08 +0100)]
ipa/92606 - properly handle no_icf attribute for variables
The following adds no_icf handling for variables where the attribute
was rejected. It also fixes the check for no_icf by checking both
the source and the targets decl.
PR ipa/92606
gcc/c-family/
* c-attribs.cc (handle_noicf_attribute): Also allow the
attribute on global variables.
gcc/
* ipa-icf.cc (sem_item_optimizer::merge_classes): Check
both source and alias for the no_icf attribute.
* doc/extend.texi (no_icf): Document variable attribute.
Richard Biener [Tue, 12 Dec 2023 13:01:47 +0000 (14:01 +0100)]
tree-optimization/112961 - include latch in if-conversion CSE
The following makes sure to also process the (empty) latch when
performing CSE on the if-converted loop body. That's important
to get all uses of copies propagated out on the backedge as well.
To avoid CSE on the PHI nodes itself which is prohibitive
(see PR90402) this temporarily adds a fake entry edge to the loop.
PR tree-optimization/112961
* tree-if-conv.cc (tree_if_conversion): Instead of excluding
the latch block from VN, add a fake entry edge.
Xi Ruoyao [Fri, 24 Nov 2023 03:08:19 +0000 (11:08 +0800)]
Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]
With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
FE_INEXACT and it should produce an integral result (if the input is not
NaN or Inf). Thus FE_INEXACT should not be raised.
But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and
non-Inf value. C23 recommends to do so in a footnote.
Thus we should not simplify (int)trunc(x) to (int)x if
-fno-fp-int-builtin-inexact is in-effect.
gcc/ChangeLog:
PR middle-end/107723
* convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break
early if !flag_fp_int_builtin_inexact and flag_trapping_math.
gcc/testsuite/ChangeLog:
PR middle-end/107723
* gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.