Richard Biener [Wed, 5 Mar 2025 13:24:50 +0000 (14:24 +0100)]
debug/101533 - ICE with variant typedef DIE generation
There's a sanity check in gen_type_die_with_usage that trips
unnecessarily for a case where the relevant DIE has already been
generated successfully in other ways. The following keys the
existing TREE_ASM_WRITTEN check on the correct object, honoring
this and does nothing instead of ICEing for the testcase at hand.
PR debug/101533
* dwarf2out.cc (gen_type_die_with_usage): When we have
output the typedef already do nothing for a typedef variant.
Do not set TREE_ASM_WRITTEN on the type.
Richard Biener [Wed, 31 Jul 2024 08:07:45 +0000 (10:07 +0200)]
middle-end/101478 - ICE with degenerate address during gimplification
When we gimplify &MEM[0B + 4] we are re-folding the address in case
types are not canonical which ends up with a constant address that
recompute_tree_invariant_for_addr_expr ICEs on. Properly guard
that call.
PR middle-end/101478
* gimplify.cc (gimplify_addr_expr): Check we still have an
ADDR_EXPR before calling recompute_tree_invariant_for_addr_expr.
Richard Biener [Mon, 17 Feb 2025 14:53:11 +0000 (15:53 +0100)]
tree-optimization/98845 - ICE with tail-merging and DCE/DSE disabled
The following shows that tail-merging will make dead SSA defs live
in paths where it wasn't before, possibly introducing UB or as
in this case, uses of abnormals that eventually fail coalescing
later. The fix is to register such defs for stmt comparison.
PR tree-optimization/98845
* tree-ssa-tail-merge.cc (stmt_local_def): Consider a
def with no uses not local.
* gcc.dg/pr98845.c: New testcase.
* gcc.dg/pr81192.c: Adjust.
Richard Biener [Fri, 28 Feb 2025 13:09:29 +0000 (14:09 +0100)]
lto/91299 - weak definition inlined with LTO
The following fixes a thinko in the handling of interposed weak
definitions which confused the interposition check in
get_availability by setting DECL_EXTERNAL too early.
PR lto/91299
gcc/lto/
* lto-symtab.cc (lto_symtab_merge_symbols): Set DECL_EXTERNAL
only after calling get_availability.
gcc/testsuite/
* gcc.dg/lto/pr91299_0.c: New testcase.
* gcc.dg/lto/pr91299_1.c: Likewise.
Richard Biener [Fri, 28 Feb 2025 09:36:11 +0000 (10:36 +0100)]
tree-optimization/87984 - hard register assignments not preserved
The following disables redundant store elimination to hard register
variables which isn't valid.
PR tree-optimization/87984
* tree-ssa-dom.cc (dom_opt_dom_walker::optimize_stmt): Do
not perform redundant store elimination to hard register
variables.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt):
Likewise.
When the C++ frontend clones a CTOR we do not copy ASM_EXPR constraints
fully as walk_tree does not recurse to TREE_PURPOSE of TREE_LIST nodes.
At this point doing that seems too dangerous so the following instead
avoids gimplification of ASM_EXPRs to clobber the shared constraints
and unshares it there, like it also unshares TREE_VALUE when it
re-writes a "+" output constraint to separate "=" output and matching
input constraint.
PR middle-end/66279
* gimplify.cc (gimplify_asm_expr): Copy TREE_PURPOSE before
rewriting it for "+" processing.
Marek Polacek [Tue, 25 Mar 2025 17:36:24 +0000 (13:36 -0400)]
c++: fix missing lifetime extension [PR119383]
Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR ->
TARGET_EXPR folding not to change its value category. That fix seems
correct but it made us stop extending the lifetime in this testcase,
causing a wrong-code issue -- extend_ref_init_temps_1 did not see
through the extra *& because it doesn't use a tree walk.
This patch reverts r15-8011 and instead handles the problem in
build_over_call by calling force_lvalue in the is_really_empty_class
case as well as in the general case.
PR c++/119383
gcc/cp/ChangeLog:
* call.cc (build_over_call): Use force_lvalue to ensure op= returns
an lvalue.
* cp-tree.h (force_lvalue): Declare.
* cvt.cc (force_lvalue): New.
* typeck.cc (cp_build_indirect_ref_1): Revert r15-8011.
Jonathan Wakely [Tue, 1 Apr 2025 10:02:43 +0000 (11:02 +0100)]
libstdc++: Avoid bogus -Walloc-size-larger-than warning in test [PR116212]
The compiler can't tell that the vector size fits in int, so it thinks
it might overflow to a negative value, which would then be a huge
positive size_t. In reality, the vector size never exceeds five.
There's no warning on trunk, so just change the local variable to use
type unsigned so that we get rid of the warning on the branches.
libstdc++-v3/ChangeLog:
PR libstdc++/116212
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Use unsigned for vector size.
Martin Jambor [Fri, 14 Mar 2025 15:07:01 +0000 (16:07 +0100)]
ipa: Do not modify cgraph edges from thunk clones during inlining (PR116572)
In PR 116572 we hit an assert that a thunk which does not have a body
looks like it has one. It does not, but the call_stmt of its outgoing
edge points to a statement, which should not. In fact it has several
outgoing call graph edges, which cannot be. The problem is that the
code updating the edges to reflect inlining into the master clone (an
ex-thunk, unlike the clone, which is still an unexpanded thunk) is
being updated during inling into the master clone. This patch simply
makes the code to skip unexpanded thunk clones.
gcc/ChangeLog:
2025-03-13 Martin Jambor <mjambor@suse.cz>
PR ipa/116572
* cgraph.cc (cgraph_update_edges_for_call_stmt): Do not update
edges of clones that are unexpanded thunk. Assert that the node
passed as the parameter is not an unexpanded thunk.
Jonathan Wakely [Mon, 10 Mar 2025 14:29:36 +0000 (14:29 +0000)]
libstdc++: Add static_assert to std::packaged_task::packaged_task(F&&)
LWG 4154 (approved in Wrocław, November 2024) fixed the Mandates:
precondition for std::packaged_task::packaged_task(F&&) to match what
the implementation actually requires. We already gave a diagnostic in
the right cases as required by the issue resolution, so strictly
speaking we don't need to do anything. But the current diagnostic comes
from inside the implementation of std::__invoke_r and could be more
user-friendly.
For C++17 (when std::is_invocable_r_v is available) add a static_assert
to the constructor, so the error is clear:
Iain Buclaw [Sat, 29 Mar 2025 22:16:25 +0000 (23:16 +0100)]
d: Fix error with -Warray-bounds and -O2 [PR117002]
The record layout of class types in D don't get any tail padding, so it
is possible for the `classInstanceSize' to not be a multiple of the
`classInstanceAlignment'.
Rather than setting the instance alignment on the underlying
RECORD_TYPE, instead give the type an alignment of 1, which will mark it
as TYPE_PACKED. The value of `classInstanceAlignment' is instead
applied to the DECL_ALIGN of both the static `init' symbol, and the
stack allocated variable used when generating `new' for a `scope' class.
PR d/117002
gcc/d/ChangeLog:
* decl.cc (aggregate_initializer_decl): Set explicit decl alignment of
class instance.
* expr.cc (ExprVisitor::visit (NewExp *)): Likewise.
* types.cc (TypeVisitor::visit (TypeClass *)): Mark the record type of
classes as packed.
Jonathan Wakely [Thu, 6 Mar 2025 21:18:21 +0000 (21:18 +0000)]
libstdc++: Make std::erase for linked lists convert to bool
LWG 4135 (approved in Wrocław, November 2024) fixes the lambda
expressions used by std::erase for std::list and std::forward_list.
Previously they attempted to copy something that isn't required to be
copyable. Instead they should convert it to bool right away.
The issue resolution also changes the lambda's parameter to be const, so
that it can't modify the elements while comparing them.
libstdc++-v3/ChangeLog:
* include/std/forward_list (erase): Change lambda to have
explicit return type and const parameter type.
* include/std/list (erase): Likewise.
* testsuite/23_containers/forward_list/erasure.cc: Check lambda
is correct.
* testsuite/23_containers/list/erasure.cc: Likewise.
Jonathan Wakely [Tue, 25 Mar 2025 13:24:08 +0000 (13:24 +0000)]
libstdc++: Optimize std::vector construction from input iterators [PR108487]
LWG 3291 make std::ranges::iota_view's iterator have input_iterator_tag
as its iterator_category, even though it satisfies the C++20
std::forward_iterator concept. This means that the traditional
std::vector::vector(InputIterator, InputIterator) constructor treats
iota_view iterators as input iterators, because it only understands the
C++17 iterator requirements, not the C++20 iterator concepts. This
results in a loop that calls emplace_back for each individual element of
the iota_view, requiring the vector to reallocate repeatedly as the
values are inserted. This makes it unnecessarily slow to construct a
vector from an iota_view.
This change adds a new _M_range_initialize_n function for initializing a
vector from a range (which doesn't have to be common) and a size. This
new function can be used by vector(InputIterator, InputIterator) when
std::ranges::distance can be used to get the size. It can also be used
by the _M_range_initialize overload that gets the size for a
Cpp17ForwardIterator pair using std::distance, and by the
vector(initializer_list) constructor.
With this new function constructing a std::vector from iota_view does
a single allocation of the correct size and so doesn't need to
reallocate in a loop.
libstdc++-v3/ChangeLog:
PR libstdc++/108487
* include/bits/stl_vector.h (vector(initializer_list)): Call
_M_range_initialize_n instead of _M_range_initialize.
(vector(InputIterator, InputIterator)): Use _M_range_initialize_n
for C++20 sized sentinels and forward iterators.
(vector::_M_range_initialize(FwIt, FwIt, forward_iterator_tag)):
Use _M_range_initialize_n.
(vector::_M_range_initialize_n): New function.
* testsuite/23_containers/vector/cons/108487.cc: New test.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/initlist-opt1.C: Match _M_range_initialize_n
instead of _M_range_initialize.
* g++.dg/tree-ssa/initlist-opt2.C: Likewise.
Jonathan Wakely [Fri, 12 Jan 2024 16:57:41 +0000 (16:57 +0000)]
libstdc++: Update tzdata to 2025a
Import the new 2025a tzdata.zi file. The leapseconds file was also
updated to have a new expiry (no new leap seconds were added).
libstdc++-v3/ChangeLog:
* include/std/chrono (__detail::__get_leap_second_info): Update
expiry date for leap seconds list.
* src/c++20/tzdata.zi: Import new file from 2025a release.
* src/c++20/tzdb.cc (tzdb_list::_Node::_S_read_leap_seconds)
Update expiry date for leap seconds list.
Martin Uecker [Sat, 23 Nov 2024 07:04:05 +0000 (08:04 +0100)]
Fix type compatibility for types with flexible array member 2/2 [PR113688,PR114713,PR117724]
For checking or computing TYPE_CANONICAL, ignore the array size when it is
the last element of a structure or union. To not get errors because of
an inconsistent number of members, zero-sized arrays which are the last
element are not ignored anymore when checking the fields of a struct.
LRA substitute all scratches with new pseudos, so we have:
(insn 484 483 485 72 (parallel [
(set (reg/v:SI 143 [ __q1 ])
(plus:SI (reg/v:SI 143 [ __q1 ])
(const_int -2 [0xfffffffffffffffe])))
(clobber (reg:QI 619))
]) "/mnt/d/avr-lra/udivmoddi.c":163:405 discrim 5 186 {addsi3}
(expr_list:REG_UNUSED (reg:QI 619)
(nil)))
Pseudo 619 is a special scratch register generated by LRA which is marked in `scratch_bitmap' and can be tested by call `ira_former_scratch_p(regno)'.
In dump file (udivmoddi.c.317r.reload) we have:
Creating newreg=619
Removing SCRATCH to p619 in insn #484 (nop 3)
rescanning insn with uid = 484.
After that LRA tries to spill (reg:QI 619)
It's a bug because (reg:QI 619) is an output scratch register which is already something like spill register.
Fragment from udivmoddi.c.317r.reload:
Choosing alt 2 in insn 484: (0) r (1) 0 (2) nYnn (3) &d {addsi3}
Creating newreg=728 from oldreg=619, assigning class LD_REGS to r728
IMHO: the bug is in lra-constraints.cc in function `get_reload_reg'
fragment of `get_reload_reg':
if (type == OP_OUT)
{
/* Output reload registers tend to start out with a conservative
choice of register class. Usually this is ALL_REGS, although
a target might narrow it (for performance reasons) through
targetm.preferred_reload_class. It's therefore quite common
for a reload instruction to require a more restrictive class
than the class that was originally assigned to the reload register.
In these situations, it's more efficient to refine the choice
of register class rather than create a second reload register.
This also helps to avoid cycling for registers that are only
used by reload instructions. */
if (REG_P (original)
&& (int) REGNO (original) >= new_regno_start
&& INSN_UID (curr_insn) >= new_insn_uid_start
__________________________________^^
&& in_class_p (original, rclass, &new_class, true))
{
unsigned int regno = REGNO (original);
if (lra_dump_file != NULL)
{
fprintf (lra_dump_file, " Reuse r%d for output ", regno);
dump_value_slim (lra_dump_file, original, 1);
}
This condition incorrectly limits register reuse to ONLY newly generated instructions.
i.e. LRA can reuse registers only from insns generated by himself.
IMHO:It's wrong.
Scratch registers generated by LRA also have to be reused.
The patch is very simple.
On x86_64, it bootstraps+regtests fine.
Jakub Jelinek [Wed, 26 Mar 2025 13:03:50 +0000 (14:03 +0100)]
widening_mul: Fix up further r14-8680 widening mul issues [PR119417]
The following testcase is miscompiled since r14-8680 PR113560 changes.
I've already tried to fix some of the issues caused by that change in
r14-8823 PR113759, but apparently didn't get it right.
The problem is that the r14-8680 changes sometimes set *type_out to
a narrower type than the *new_rhs_out actually has (because it will
handle stuff like _1 = rhs1 & 0xffff; and imply from that HImode type_out.
Now, if in convert_mult_to_widen or convert_plusminus_to_widen we actually
get optab for the modes we've asked for (i.e. with from_mode and to_mode),
everything works fine, if the operands don't have the expected types,
they are converted to those (for INTEGER_CSTs with fold_convert,
otherwise with build_and_insert_cast).
On the following testcase on aarch64 that is not the case, we ask
for from_mode HImode and to_mode DImode, but get actual_mode SImode.
The mult_rhs1 operand already has SImode and we change type1 to unsigned int
and so no cast is actually done, except that the & 0xffff is lost that way.
The following patch ensures that if we change typeN because of wider
actual_mode (or because of a sign change), we first cast to the old
typeN (if the r14-8680 code was encountered, otherwise it would have the
same precision) and only then change it, and then perhaps cast again.
On the testcase on aarch64-linux the patch results in the expected
- add x19, x19, w0, uxtw 1
+ add x19, x19, w0, uxth 1
difference.
2025-03-26 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119417
* tree-ssa-math-opts.cc (convert_mult_to_widen): Before changing
typeN because actual_precision/from_unsignedN differs cast rhsN
to typeN if it has a different type.
(convert_plusminus_to_widen): Before changing
typeN because actual_precision/from_unsignedN differs cast mult_rhsN
to typeN if it has a different type.
Jakub Jelinek [Wed, 26 Mar 2025 07:47:20 +0000 (08:47 +0100)]
i386: Require in peephole2 that memory is offsettable [PR119450]
The following testcase ICEs because a peephole2 attempts to offset
memory which is not offsettable (in particular address is a ZERO_EXTEND
in this case).
Because peephole2s don't support constraints, I've added a check for this
in the peephole2's condition.
2025-03-26 Jakub Jelinek <jakub@redhat.com>
PR target/119450
* config/i386/i386.md (narrow test peephole2): Test for
offsettable_memref_p in condition.
Jakub Jelinek [Sat, 22 Mar 2025 07:39:38 +0000 (08:39 +0100)]
Fix up some further cases of missing or extraneous spaces in diagnostics
Given the recent PR119406 I've tried to grep for concatenated string
literals without space at the end of one line and at the start of next line,
unless it was obviously intentional.
Furthermore, I've then looked through gcc.pot looking for 2 adjacent spaces
and looking back if that wasn't the case of "something "
" with spaces at both sides".
Here is the result from that.
I think just the c.opt change needs an explanation, the "" in the
description is simply eaten up somewhere during the option processing and
gcc -v --help before this patch was displaying
-Wdeprecated-literal-operator Warn about deprecated space between and suffix in a user-defined literal operator.
2025-03-22 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.cc (warn_switch_unreachable_and_auto_init_r): Add missing
space in the middle of diagnostics.
* tree-vect-stmts.cc (vectorizable_load): Add missing space in the
middle of debug dump message.
gcc/fortran/
* resolve.cc (resolve_procedure_expression): Remove extraneous space
from the middle of diagnostics.
Jakub Jelinek [Wed, 12 Mar 2025 23:42:54 +0000 (00:42 +0100)]
c++: Evaluate immediate invocation call arguments with mce_true [PR119150]
Since Marek's r14-4140 which moved immediate invocation evaluation
from build_over_call to cp_fold_r, the following testcase is miscompiled.
The a = foo (bar ()); case is actually handled right, that is handled
in cp_fold_r and the whole CALL_EXPR is at that point evaluated by
cp_fold_immediate_r with cxx_constant_value (stmt, tf_none);
and that uses mce_true for evaluation of the argument as well as the actual
call.
But in the bool b = foo (bar ()); case we actually try to evaluate this
as non-manifestly constant-evaluated. And while
/* Make sure we fold std::is_constant_evaluated to true in an
immediate function. */
if (DECL_IMMEDIATE_FUNCTION_P (fun))
call_ctx.manifestly_const_eval = mce_true;
ensures that if consteval and __builtin_is_constant_evaluated () is true
inside of that call, this happens after arguments to the function
have been already constant evaluated in cxx_bind_parameters_in_call.
The call_ctx in that case also includes new call_ctx.call, something that
shouldn't be used for the arguments, so the following patch just arranges
to call cxx_bind_parameters_in_call with manifestly_constant_evaluated =
mce_true.
2025-03-13 Jakub Jelinek <jakub@redhat.com>
PR c++/119150
* constexpr.cc (cxx_eval_call_expression): For
DECL_IMMEDIATE_FUNCTION_P (fun) set manifestly_const_eval in new_ctx
and new_call to mce_true and set ctx to &new_ctx.
Jakub Jelinek [Wed, 12 Mar 2025 07:27:17 +0000 (08:27 +0100)]
builtins: Fix up strspn/strcspn folding [PR119219]
The PR119204 r15-7955 fix caused some regressions.
The problem is that the fold_builtin* APIs document that expr is
either a CALL_EXPR of the call or NULL, so using TREE_TYPE (expr)
can crash e.g. during constexpr evaluation etc.
As can be seen in the surrounding patch, for the neighbouring builtins
(both modf and strpbrk) fold_builtin_2 passes down type, which is the
result type, TREE_TYPE (TREE_TYPE (fndecl)) and those builtins use it
to build the return value, while strspn was always building size_type_node
and strcspn had this change from that to TREE_TYPE (expr).
The patch passes type to these two and uses it there as well.
The patch keeps passing expr because it is used in the
check_nul_terminated_array calls done for both strspn and strcspn,
those calls clearly can deal with NULL expr but prefer if it is non-NULL
for some warning.
2025-03-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/119204
PR middle-end/119219
* builtins.cc (fold_builtin_2): Pass type as another argument
to fold_builtin_strspn and fold_builtin_strcspn.
(fold_builtin_strspn): Add type argument, use it instead of
size_type_node.
(fold_builtin_strcspn): Add type argument, use it instead of
TREE_TYPE (expr).
Jakub Jelinek [Tue, 11 Mar 2025 10:01:55 +0000 (11:01 +0100)]
tree: Improve skip_simple_arithmetic [PR119183]
The following testcase takes very long time to compile, because
skip_simple_arithmetic decides to first call tree_invariant_p on
the second argument (and indirectly recurse there). I think before
canonicalization of operands for commutative binary expressions
(and for non-commutative ones always) it is pretty common that the
first operand is a constant, something which tree_invariant_p handles
immediately, so the following patch special cases that; I've added
there a tree_invariant_p call too after the checks, while it is not
really needed currently, tree_invariant_p has the same checks, I wanted
to be prepared in case tree_invariant_p changes. But if you think
I should avoid it, I can drop it too.
This is just a partial fix, I think one can certainly construct a testcase
which will still have horrible compile time complexity (but I've tried and
haven't managed to do so), so perhaps we should just limit the recursion
depth through skip_simple_arithmetic/tree_invariant_p with some defaulted
argument.
2025-03-11 Jakub Jelinek <jakub@redhat.com>
PR c/119183
* tree.cc (skip_simple_arithmetic): If first operand of binary
expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call
tree_invariant_p on that operand first instead of on the second.
Jakub Jelinek [Mon, 10 Mar 2025 09:34:00 +0000 (10:34 +0100)]
libgcc: Fix up unwind-dw2-btree.h [PR119151]
The following testcase shows a bug in unwind-dw2-btree.h.
In short, the header provides lock-free btree data structure (so no parent
link on nodes, both insertion and deletion are done in top-down walks
with some locking of just a few nodes at a time so that lookups can notice
concurrent modifications and retry, non-leaf (inner) nodes contain keys
which are initially the base address of the left-most leaf entry of the
following child (or all ones if there is none) minus one, insertion ensures
balancing of the tree to ensure [d/2, d] entries filled through aggressive
splitting if it sees a full tree while walking, deletion performs various
operations like merging neighbour trees, merging into parent or moving some
nodes from neighbour to the current one).
What differs from the textbook implementations is mostly that the leaf nodes
don't include just address as a key, but address range, address + size
(where we don't insert any ranges with zero size) and the lookups can be
performed for any address in the [address, address + size) range. The keys
on inner nodes are still just address-1, so the child covers all nodes
where addr <= key unless it is covered already in children to the left.
The user (static executables or JIT) should always ensure there is no
overlap in between any of the ranges.
In the testcase a bunch of insertions are done, always followed by one
removal, followed by one insertion of a range slightly different from the
removed one. E.g. in the first case [&code[0x50], &code[0x59]] range
is removed and then we insert [&code[0x4c], &code[0x53]] range instead.
This is valid, it doesn't overlap anything. But the problem is that some
non-leaf (inner) one used the &code[0x4f] key (after the 11 insertions
completely correctly). On removal, nothing adjusts the keys on the parent
nodes (it really can't in the top-down only walk, the keys could be many nodes
above it and unlike insertion, removal only knows the start address, doesn't
know the removed size and so will discover it only when reaching the leaf
node which contains it; plus even if it knew the address and size, it still
doesn't know what the second left-most leaf node will be (i.e. the one after
removal)). And on insertion, if nodes aren't split at a level, nothing
adjusts the inner keys either. If a range is inserted and is either fully
bellow key (keys are - 1, so having address + size - 1 being equal to key is
fine) or fully after key (i.e. address > key), it works just fine, but if
the key is in a middle of the range like in this case, &code[0x4f] is in the
middle of the [&code[0x4c], &code[0x53]] range, then insertion works fine
(we only use size on the leaf nodes), and lookup of the addresses below
the key work fine too (i.e. [&code[0x4c], &code[0x4f]] will succeed).
The problem is with lookups after the key (i.e. [&code[0x50, &code[0x53]]),
the lookup looks for them in different children of the btree and doesn't
find an entry and returns NULL.
As users need to ensure non-overlapping entries at any time, the following
patch fixes it by adjusting keys during insertion where we know not just
the address but also size; if we find during the top-down walk a key
which is in the middle of the range being inserted, we simply increase the
key to be equal to address + size - 1 of the range being inserted.
There can't be any existing leaf nodes overlapping the range in correct
programs and the btree rebalancing done on deletion ensures we don't have
any empty nodes which would also cause problems.
The patch adjusts the keys in two spots, once for the current node being
walked (the last hunk in the header, with large comment trying to explain
it) and once during inner node splitting in a parent node if we'd otherwise
try to add that key in the middle of the range being inserted into the
parent node (in that case it would be missed in the last hunk).
The testcase covers both of those spots, so succeeds with GCC 12 (which
didn't have btrees) and fails with vanilla GCC trunk and also fails if
either the
if (fence < base + size - 1)
fence = iter->content.children[slot].separator = base + size - 1;
or
if (left_fence >= target && left_fence < target + size - 1)
left_fence = target + size - 1;
hunk is removed (of course, only with the current node sizes, i.e. up to
15 children of inner nodes and up to 10 entries in leaf nodes).
2025-03-10 Jakub Jelinek <jakub@redhat.com>
Michael Leuchtenburg <michael@slashhome.org>
PR libgcc/119151
* unwind-dw2-btree.h (btree_split_inner): Add size argument. If
left_fence is in the middle of [target,target + size - 1] range,
increase it to target + size - 1.
(btree_insert): Adjust btree_split_inner caller. If fence is smaller
than base + size - 1, increase it and separator of the slot to
base + size - 1.
Jakub Jelinek [Thu, 6 Mar 2025 17:26:37 +0000 (18:26 +0100)]
c++: Update TYPE_FIELDS of variant types if cp_parser_late_parsing_default_args etc. modify it [PR98533]
The following testcases ICE during type verification, because TYPE_FIELDS
of e.g. S RECORD_TYPE in pr119123.C is different from TYPE_FIELDS of const S.
Various decls are added to S's TYPE_FIELDS first, then finish_struct
indirectly calls fixup_type_variants to sync the variant copies.
But later on cp_parser_class_specifier calls
cp_parser_late_parsing_default_args and that apparently adds a lambda
type (from default argument) to TYPE_FIELDS of S.
Dunno if that is right or not, assuming it is right, the following
patch fixes it by updating TYPE_FIELDS of variant types if there were
any changes in the various functions cp_parser_class_specifier defers and
calls on the outermost enclosing class.
There was quite a lot of code repetition already before, so the patch
uses a lambda to avoid the repetitions.
To my surprise, in some of the contract testcases (
g++.dg/contracts/contracts-friend1.C
g++.dg/contracts/contracts-nested-class1.C
g++.dg/contracts/contracts-nested-class2.C
g++.dg/contracts/contracts-redecl7.C
g++.dg/contracts/contracts-redecl8.C
) it is actually setting class_type and pushing TRANSLATION_UNIT_DECL
rather than some class types in some cases.
Or should the lambda pushing into the containing class be somehow avoided?
2025-03-06 Jakub Jelinek <jakub@redhat.com>
PR c++/98533
PR c++/119123
* parser.cc (cp_parser_class_specifier): Update TYPE_FIELDS of
variant types in case cp_parser_late_parsing_default_args etc. change
TYPE_FIELDS on the main variant. Add switch_to_class lambda and
use it to simplify repeated class switching code.
* g++.dg/cpp0x/pr98533.C: New test.
* g++.dg/cpp0x/pr119123.C: New test.
Jakub Jelinek [Wed, 5 Mar 2025 13:30:35 +0000 (14:30 +0100)]
value-range: Fix up irange::union_bitmask [PR118953]
The following testcase is miscompiled during evrp.
Before vrp, we have (from ccp):
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x2d
_3 = _2 + 18446744073708503085;
...
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x59
_6 = (long long unsigned int) _5;
# RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34
_7 = k_11 + -1048524;
switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> [33.33%], case 32: <L6> [33.33%]>
...
# RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d VALUE 0x0
# i_20 = PHI <_3(4), 0(3), _6(2)>
and evrp is now trying to figure out range for i_20 in range_of_phi.
All the ranges and MASK/VALUE pairs above are correct for the testcase,
k_11 and _2 based on it is a result of multiplication by a constant with low
14 bits cleared and then some numbers are added to it.
There is an obvious missed optimization for which I've filed PR119039,
simplify_switch_using_ranges could see that all the labels but default
are unreachable because the controlling expression has
MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that.
Anyway, during range_of_phi for i_20, we process the PHI arguments
in order. For the _3(4) case, we figure out that it is reachable
through the case 24: case 32: labels only of the switch and that
0x34 - 0x2d is 7, so derive
[irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
(the MASK/VALUE just got inherited from the _3 earlier range).
Now (not suprisingly because those labels aren't actually reachable),
that range is inconsistent, 0x2d is 45, so there is conflict between the
values and the irange_bitmask.
value-range.{h,cc} code differentiates between actually stored
irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and
semantic bitmask, which is what get_bitmask returns. That is
// The mask inherent in the range is calculated on-demand. For
// example, [0,255] does not have known bits set by default. This
// saves us considerable time, because setting it at creation incurs
// a large penalty for irange::set. At the time of writing there
// was a 5% slowdown in VRP if we kept the mask precisely up to date
// at all times. Instead, we default to -1 and set it when
// explicitly requested. However, this function will always return
// the correct mask.
//
// This also means that the mask may have a finer granularity than
// the range and thus contradict it. Think of the mask as an
// enhancement to the range. For example:
//
// [3, 1000] MASK 0xfffffffe VALUE 0x0
//
// 3 is in the range endpoints, but is excluded per the known 0 bits
// in the mask.
//
// See also the note in irange_bitmask::intersect.
irange_bitmask bm
= get_bitmask_from_range (type (), lower_bound (), upper_bound ());
if (!m_bitmask.unknown_p ())
bm.intersect (m_bitmask);
Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects
that with that MASK 0xffffffffffffc000 VALUE 0x2d.
Which triggers the ugly special case in irange_bitmask::intersect:
// If we have two known bits that are incompatible, the resulting
// bit is undefined. It is unclear whether we should set the entire
// range to UNDEFINED, or just a subset of it. For now, set the
// entire bitmask to unknown (VARYING).
if (wi::bit_and (~(m_mask | src.m_mask),
m_value ^ src.m_value) != 0)
{
unsigned prec = m_mask.get_precision ();
m_mask = wi::minus_one (prec);
m_value = wi::zero (prec);
}
so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0.
Next, range_of_phi attempts to union it with the 0(3) PHI argument,
and during irange::union_ first adds the [0,0] to the subranges, so
[irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
and then goes on to irange::union_bitmask which does
if (m_bitmask == r.m_bitmask)
return false;
irange_bitmask bm = get_bitmask ();
irange_bitmask save = bm;
bm.union_ (r.get_bitmask ());
if (save == bm)
return false;
m_bitmask = bm;
if (save == get_bitmask ())
return false;
m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as
r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask
(but note, not from the original range before union, but the modified one,
dunno if that isn't a problem as well), which is still the VARYING/unknown_p
one, union_ that with MASK 0x0 VALUE 0x0 and get still
MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic
bitmask didn't change, so we are fine (not!, see later).
Except then we try to union with the third PHI argument. And, because the
edge to that comes only from case 8: label and there is a known difference
between the two, the argument is actually already from earlier replaced by
45(2) constant. So, irange::union_ adds the [45, 45] range to the list
of subranges, but voila, 45 is 0x2d and satisfies the stored
MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to
from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d
by that addition. Eventually, we just optimize this to
[irange] long long unsigned int [45, 45] because that is the only range
which satisfies the bitmask. And that is wrong, at runtime i_20 has
value 0.
The following patch attempts to detect this case where get_bitmask
turns some non-VARYING m_bitmask into VARYING one because of a conflict
and in that case makes sure m_bitmask is actually updated rather than
unmodified, so that later union_ doesn't cause problems.
I also wonder whether e.g. get_bitmask couldn't have special case for this
and if bm.intersect (m_bitmask); yields unknown_p from something not
originally unknown_p, perhaps chooses to just use get_bitmask_from_range
value and ignore the stored m_bitmask. Though, dunno how union_bitmask
in that case would figure out it needs to update m_bitmask.
2025-03-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118953
* value-range.cc (irange::union_bitmask): Update m_bitmask if
get_bitmask () is unknown_p and m_bitmask is not even when the
semantic bitmask didn't change and returning false.
Jakub Jelinek [Sat, 1 Mar 2025 08:15:57 +0000 (09:15 +0100)]
openmp: Fix up simd clone mask argument creation on x86 [PR115871]
The following testcase ICEs since r14-5057.
The Intel vector ABI says that in the ZMM case the masks is passed
in unsigned int or unsigned long long arguments and how many bits in
them and how many of those arguments are is determined by the characteristic
data type of the function. In the testcase simdlen is 32 and characteristic
data type is double, so return as well as first argument is passed in 4
V8DFmode arguments and the mask is supposed to be passed in 4 unsigned int
arguments (8 bits in each).
Before the r14-5057 change there was
sc->args[i].orig_type = parm_type;
...
case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
case SIMD_CLONE_ARG_TYPE_VECTOR:
if (INTEGRAL_TYPE_P (parm_type) || POINTER_TYPE_P (parm_type))
veclen = sc->vecsize_int;
else
veclen = sc->vecsize_float;
if (known_eq (veclen, 0U))
veclen = sc->simdlen;
else
veclen
= exact_div (veclen,
GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
for the argument handling and
if (sc->inbranch)
{
tree base_type = simd_clone_compute_base_data_type (sc->origin, sc);
...
if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type))
veclen = sc->vecsize_int;
else
veclen = sc->vecsize_float;
if (known_eq (veclen, 0U))
veclen = sc->simdlen;
else
veclen = exact_div (veclen,
GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
for the mask handling. r14-5057 moved this argument creation later and
unified that:
case SIMD_CLONE_ARG_TYPE_MASK:
case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
case SIMD_CLONE_ARG_TYPE_VECTOR:
if (sc->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK
&& sc->mask_mode != VOIDmode)
elem_type = boolean_type_node;
else
elem_type = TREE_TYPE (sc->args[i].vector_type);
if (INTEGRAL_TYPE_P (elem_type) || POINTER_TYPE_P (elem_type))
veclen = sc->vecsize_int;
else
veclen = sc->vecsize_float;
if (known_eq (veclen, 0U))
veclen = sc->simdlen;
else
veclen
= exact_div (veclen,
GET_MODE_BITSIZE (SCALAR_TYPE_MODE (elem_type)));
This is correct for the argument cases (so linear or vector) (though
POINTER_TYPE_P will never appear as TREE_TYPE of a vector), but the
boolean_type_node in there is completely bogus, when using AVX512 integer
masks as I wrote above we need the characteristic data type, not bool,
and bool is strange in that it has bitsize of 8 (or 32 on darwin), while
the masks are 1 bit per lane anyway.
Fixed thusly.
2025-03-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115871
* omp-simd-clone.cc (simd_clone_adjust): For SIMD_CLONE_ARG_TYPE_MASK
and sc->mask_mode not VOIDmode, set elem_type to the characteristic
type rather than boolean_type_node.
I've added the asserts that probe == target because {REAL,IMAG}PART_EXPR
always implies a scalar type and so applying ARRAY_REF/COMPONENT_REF
etc. on it further doesn't make sense and the later code relies on it
to be the last one in refs array. But as the following testcase shows,
we can fail those assertions in case there is a reference or pointer
to the __real__ or __imag__ part, in that case we just evaluate the
constant expression and so probe won't be the same as target.
That case doesn't push anything into the refs array though.
The following patch changes those asserts to verify that refs is still
empty, which fixes it.
2025-02-28 Jakub Jelinek <jakub@redhat.com>
PR c++/119045
* constexpr.cc (cxx_eval_store_expression) <case REALPART_EXPR>:
Assert that refs->is_empty () rather than probe == target.
(cxx_eval_store_expression) <case IMAGPART_EXPR>: Likewise.
Jakub Jelinek [Wed, 26 Feb 2025 18:29:12 +0000 (19:29 +0100)]
c: stddef.h C23 fixes [PR114870]
The stddef.h header for C23 defines __STDC_VERSION_STDDEF_H__ and
unreachable macros multiple times in some cases.
The header doesn't have normal multiple inclusion guard, because it supports
for glibc inclusion with __need_{size_t,wchar_t,ptrdiff_t,wint_t,NULL}.
While the definition of __STDC_VERSION_STDDEF_H__ and unreachable is done
solely in the #ifdef _STDDEF_H part, so they are defined only if stddef.h
is included without those __need_* macros defined. But actually once
stddef.h is included without the __need_* macros, _STDDEF_H is then defined
and while further stddef.h includes without __need_* macros don't do
anything:
#if (!defined(_STDDEF_H) && !defined(_STDDEF_H_) && !defined(_ANSI_STDDEF_H) \
&& !defined(__STDDEF_H__)) \
|| defined(__need_wchar_t) || defined(__need_size_t) \
|| defined(__need_ptrdiff_t) || defined(__need_NULL) \
|| defined(__need_wint_t)
if one includes whole stddef.h first and then stddef.h with some of the
__need_* macros defined, the #ifdef _STDDEF_H part is used again.
It isn't that big deal for most cases, as it uses extra guarding macros
like:
#ifndef _GCC_MAX_ALIGN_T
#define _GCC_MAX_ALIGN_T
...
#endif
etc., but for __STDC_VERSION_STDDEF_H__/unreachable nothing like that is
used.
So, either we do what the following patch does and just don't define
__STDC_VERSION_STDDEF_H__/unreachable second time, or use #ifndef
unreachable separately for the #define unreachable() case, or use
new _GCC_STDC_VERSION_STDDEF_H macro to guard this (or two, one for
__STDC_VERSION_STDDEF_H__ and one for unreachable), or rework the initial
condition to be just
#if !defined(_STDDEF_H) && !defined(_STDDEF_H_) && !defined(_ANSI_STDDEF_H) \
&& !defined(__STDDEF_H__)
- I really don't understand why the header should do anything at all after
it has been included once without __need_* macros. But changing how this
behaves after 35 years might be risky for various OS/libc combinations.
2025-02-26 Jakub Jelinek <jakub@redhat.com>
PR c/114870
* ginclude/stddef.h (__STDC_VERSION_STDDEF_H__, unreachable): Don't
redefine multiple times if stddef.h is first included without __need_*
defines and later with them. Move nullptr_t and unreachable and
__STDC_VERSION_STDDEF_H__ definitions into the same
defined (__STDC_VERSION__) && __STDC_VERSION__ > 201710L #if block.
Jakub Jelinek [Tue, 25 Feb 2025 08:33:21 +0000 (09:33 +0100)]
openmp: Mark OpenMP atomic write expression as read [PR119000]
The following testcase was emitting false positive warning that
the rhs of #pragma omp atomic write was stored but not read,
when the atomic actually does read it. The following patch
fixes that by calling default_function_array_read_conversion
on it, so that it is marked as read as well as converted from
lvalue to rvalue.
Furthermore, the code had
if (code == NOP_EXPR) ... else ... if (code == NOP_EXPR) ...
with none of ... parts changing code, so I've merged the two ifs.
2025-02-25 Jakub Jelinek <jakub@redhat.com>
PR c/119000
* c-parser.cc (c_parser_omp_atomic): For omp write call
default_function_array_read_conversion on the rhs expression.
Merge the two adjacent if (code == NOP_EXPR) blocks.
Jakub Jelinek [Mon, 24 Feb 2025 11:19:16 +0000 (12:19 +0100)]
reassoc: Fix up optimize_range_tests_to_bit_test [PR118915]
The following testcase is miscompiled due to a bug in
optimize_range_tests_to_bit_test. It is trying to optimize
check for a in [-34,-34] or [-26,-26] or [-6,-6] or [-4,inf] ranges.
Another reassoc optimization folds the the test for the first
two ranges into (a + 34U) & ~8U in [0U,0U] range, and extract_bit_test_mask
actually has code to virtually undo it and treat that again as test
for a being -34 or -26. The problem is that optimize_range_tests_to_bit_test
remembers in the type variable TREE_TYPE (ranges[i].exp); from the first
range. If extract_bit_test_mask doesn't do that virtual undoing of the
BIT_AND_EXPR handling, that is just fine, the returned exp is ranges[i].exp.
But if the first range is BIT_AND_EXPR, the type could be different, the
BIT_AND_EXPR form has the optional cast to corresponding unsigned type
in order to avoid introducing UB. Now, type was used to fill in the
max value if ranges[j].high was missing in subsequently tested range,
and so in this particular testcase the [-4,inf] range which was
signed int and so [-4,INT_MAX] was treated as [-4,UINT_MAX] instead.
And we were subtracting values of 2 different types and trying to make
sense out of that.
The following patch fixes this by using the type of the low bound
(which is always non-NULL) for the max value of the high bound instead.
2025-02-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118915
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): For
highj == NULL_TREE use TYPE_MAX_VALUE (TREE_TYPE (lowj)) rather
than TYPE_MAX_VALUE (type).
Alex Coplan [Mon, 10 Mar 2025 16:44:15 +0000 (16:44 +0000)]
df: Treat partial defs as uses in df_simulate_defs [PR116564]
The PR shows us spinning in dce.cc:fast_dce at the start of combine.
This spinning appears to be because of a disagreement between the fast_dce code
and the code in df-problems.cc:df_lr_bb_local_compute. Specifically, they
disagree on the treatment of partial defs. For the testcase in the PR, we have
the following insn in bb 3:
i.e. it models partial defs as a RMW operation; thus for the def arising
from i10 above, it records a use of r104; hence it ends up in the
live-in set for bb 3.
However, as it stands, the code in dce.cc:fast_dce (and its callee
dce_process_block) has no such provision for DF_REF_PARTIAL defs. It
does not treat these as a RMW and does not compute r104 above as being
live-in to bb 3. At the end of dce_process_block we compute the
following "did something happen" condition used to decide termination of
the analysis:
because of the disagreement between df_lr_local_compute and the local
analysis done by fast_dce, we invariably have r104 in DF_LR_IN, but not
in local_live. Hence we always return true here, call
df_analyze_problem (which re-computes DF_LR_IN according to
df_lr_bb_local_compute, re-adding r104), and so the analysis never
terminates.
This patch therefore adjusts df_simulate_defs (called from
dce_process_block) to match the behaviour of df_lr_bb_local_compute in
this respect, namely we make it model partial defs as RMW operations by
setting the relevant register live. This fixes the spinning in fast_dce
for this testcase.
gcc/ChangeLog:
PR rtl-optimization/116564
* df-problems.cc (df_simulate_defs): For partial defs, mark the
register live (treat it as a RMW operation).
gcc/testsuite/ChangeLog:
PR rtl-optimization/116564
* gcc.target/aarch64/torture/pr116564.c: New test.
Jonathan Wakely [Tue, 25 Mar 2025 00:27:52 +0000 (00:27 +0000)]
libstdc++: Allow std::ranges::to to create unions
LWG 4229 points out that the std::ranges::to wording refers to class
types, but I added an assertion using std::is_class_v which only allows
non-union class types. LWG consensus is that unions should be allowed,
so this additionally uses std::is_union_v.
libstdc++-v3/ChangeLog:
* include/std/ranges (ranges::to): Allow unions as well as
non-union class types.
* testsuite/std/ranges/conv/lwg4229.cc: New test.
Jonathan Wakely [Thu, 27 Feb 2025 15:48:49 +0000 (15:48 +0000)]
libstdc++: Add static_assertions to ranges::to adaptor factory [PR112803]
The standard requires that we reject attempts to create a ranges::to
adaptor for cv-qualified types and non-class types. Currently we only
diagnose it once the adaptor is used in a pipeline.
This adds static assertions to diagnose it immediately.
libstdc++-v3/ChangeLog:
PR libstdc++/112803
* include/std/ranges (ranges::to): Add static assertions to
enforce Mandates conditions.
* testsuite/std/ranges/conv/112803.cc: New test.
Jonathan Wakely [Tue, 16 Jul 2024 08:43:06 +0000 (09:43 +0100)]
libstdc++: Define operator== for hash table iterators [PR115939]
Currently iterators for unordered containers do not directly define
operator== and operator!= overloads. Instead they rely on the base class
defining them, which is done so that iterator and const_iterator
comparisons work using the same overloads.
However this means a derived-to-base conversion is needed to call those
operators, and PR libstdc++/115939 shows that this can be ambiguous (for
-pedantic) when another overloaded operator could be used after an
implicit conversion.
This change defines operator== and operator!= directly for
_Node_iterator and _Node_const_iterator so that no derived-to-base
conversions are needed. The new overloads just forward to the base class
ones, so the implementation is still shared and doesn't need to be
duplicated.
libstdc++-v3/ChangeLog:
PR libstdc++/115939
* include/bits/hashtable_policy.h (_Node_iterator): Add
operator== and operator!=.
(_Node_const_iterator): Likewise.
* testsuite/23_containers/unordered_map/115939.cc: New test.
Martin Jambor [Fri, 7 Mar 2025 16:17:24 +0000 (17:17 +0100)]
ipa-cp: Avoid ICE when redistributing nodes among edges to recursive clones (PR 118318)
PR 118318 reported an ICE during PGO build of Firefox when IPA-CP, in
the final stages of update_counts_for_self_gen_clones where it
attempts to guess how to distribute profile count among clones created
for recursive edges and the various edges that are created in the
process. If one such edge has profile count of kind GUESSED_GLOBAL0,
the compatibility check in the operator+ will lead to an ICE. After
discussing the situation with Honza, we concluded that there is little
more we can do other than check for this situation before touching the
edge count, so this is what this patch does.
gcc/ChangeLog:
2025-02-28 Martin Jambor <mjambor@suse.cz>
PR ipa/118318
* ipa-cp.cc (adjust_clone_incoming_counts): Add a compatible_p check.
Patrick Palka [Thu, 13 Mar 2025 13:15:21 +0000 (09:15 -0400)]
libstdc++: Fix ref_view branch of views::as_const [PR119135]
Unlike for span<X> and empty_view<X>, the range_reference_t of
ref_view<X> doesn't correspond to X. This patch fixes the ref_view
branch of views::as_const to correctly query its underlying range
type X.
PR libstdc++/119135
libstdc++-v3/ChangeLog:
* include/std/ranges: Include <utility>.
(views::__detail::__is_ref_view): Replace with ...
(views::__detail::__is_constable_ref_view): ... this.
(views::_AsConst::operator()): Replace bogus use of element_type
in the ref_view branch.
* testsuite/std/ranges/adaptors/as_const/1.cc (test03): Extend
test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 50359c0a44381edb6dbd9359ef2ebdadbcc3ed42)
=== cut here ===
struct span {
span (const int (&__first)[1]) : _M_ptr (__first) {}
int operator[] (long __i) { return _M_ptr[__i]; }
const int *_M_ptr;
};
void foo () {
constexpr int a_vec[]{1};
auto vec{[&a_vec]() -> span { return a_vec; }()};
}
=== cut here ===
The problem is that perform_implicit_conversion_flags (via
mark_rvalue_use) replaces "a_vec" in the return statement by a
CONSTRUCTOR representing a_vec's constant value, and then takes its
address when invoking span's constructor. So we end up with an instance
that points to garbage instead of a_vec's storage.
As per Jason's suggestion, this patch simply removes the calls to
mark_*_use from perform_implicit_conversion_flags, which fixes the PR.
Haochen Jiang [Mon, 24 Mar 2025 07:51:16 +0000 (15:51 +0800)]
i386: Add -mavx10.1 back with 512 bit alias
When AVX10.1 options are added into GCC 14, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.1-256 and avx10.1-512 options into
compiler and alias avx10.1 to 256 bit for compatibility since there
will be real platforms with 256 bit only support.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. Therefore, we could alias avx10.1 directly
to 512 bit. However, avx10.1 alias to 256 bit has been there in GCC 14.1
and GCC 14.2, so we have to raise a warning since GCC 14.3 for this
behavior change.
While backporting the patch from GCC 15, we choose to only warn when
users use -mavx10.1 option in order not to interrupt the usage of other
options since -mavx10.1-256/512 and -mevex512 will be dropped in GCC 16.
There is no need to warn them this early in GCC 14 to overwhelm users.
Simon Martin [Mon, 24 Mar 2025 07:15:54 +0000 (08:15 +0100)]
c++: Don't mix timevar_start and auto_cond_timevar for TV_NAME_LOOKUP [PR116681]
We currently ICE upon the following testcase when using -ftime-report
=== cut here ===
template < int> using __conditional_t = int;
template < typename _Iter >
concept random_access_iterator = requires { new _Iter; };
template < typename _Iterator >
struct reverse_iterator {
using iterator_concept =
__conditional_t< random_access_iterator< _Iterator>>;
};
void RemoveBottom() {
int iter;
for (reverse_iterator< int > iter;;)
;
}
=== cut here ===
The problem is that qualified_namespace_lookup does a plain start() of
the TV_NAME_LOOKUP timer (that asserts that the timer is not already
started). However this timer has already been cond_start()'d in the call
stack - by pushdecl - so the assert fails.
This patch simply ensures that we always conditionally start this timer
(which is done in all other places that use it).
PR c++/116681
gcc/cp/ChangeLog:
* name-lookup.cc (qualified_namespace_lookup): Use an
auto_cond_timer instead of using timevar_start and timevar_stop.
Iain Buclaw [Sun, 23 Mar 2025 11:57:27 +0000 (12:57 +0100)]
d: Fix ICE type variant differs by TYPE_PACKED [PR117621]
Introduced by r13-1104-gf4c3ce32fa54c1, which had an accidental self
assignment of TYPE_PACKED when it should have been assigned to the
type's variants.
PR d/117621
gcc/d/ChangeLog:
* types.cc (finish_aggregate_type): Propagate TYPE_PACKED to variants.
Martin Uecker [Sat, 22 Mar 2025 16:05:51 +0000 (17:05 +0100)]
c: minor fixes related to arrays of unspecified size
The patch for PR117145 and PR117245 also fixed PR100420 and PR116284 which
are bugs related to arrays of unspecified size. Those are now represented
as variable size arrays with size (0, 0). There are still some loose ends,
which are resolved here by
1. adding a testcase for PR116284,
2. moving code related to creation and detection of arrays of unspecified
sizes in their own functions,
3. preferring a specified size over an unspecified size when forming
a composite type as required by C99 (PR118391)
4. removing useless code in comptypes_internal and composite_type_internal.
This fixes two cases where variably-modified types were not recognized as
such. The first is when building composite types and the other when a type
is reconstructed for the 'vector' attribute. Construction of types in
the C FE is reorganized to use c_build_* functions which are responsible for
setting C_TYPE_VARIABLE_SIZE, C_TYPE_VARIABLY_MODIFIED and TYPE_TYPELESS_STORAGE
based on the properties of the type itself and these replace all other logic
elsewhere (e.g. in grokdeclarator). A new 'c_reconstruct_complex_type' based
on these functions is introduced which is called via a language hook when the
'vector' attribute is processed (as for C++).
One problem is are arrays of unspecified size 'T[*]' which were represented
identically to zero-sized arrays but with C_TYPE_VARIABLE_SIZE set. To avoid
having to create distinct type copies for this, the representation was changed
to make it a natural VLA by giving it an upper bound of '(0, 0)'. This also
then allows fixing of PR100420 where such arrays were printed as 'T[0]'.
Finally, a new function 'c_verify_type' checks consistency of properties
specific to C FE and is called when checking is on.
Here we ICE during instantiation of the dependently scoped template
friend
template<int N>
struct<class T>
friend class A<N>::B;
ultimately because processing_template_decl isn't set during
substitution into the A<N> scope. Since it's naturally a partial
substitution, we need to make sure the flag is set.
For GCC 15, this is already fixed similarly by r15-123.
PR c++/119378
gcc/cp/ChangeLog:
* pt.cc (tsubst) <case UNBOUND_CLASS_TEMPLATE>: Set
processing_template_decl when substituting the context.
Jason Merrill [Thu, 20 Mar 2025 16:57:15 +0000 (12:57 -0400)]
ipa: target clone and mangling alias [PR114992]
Since the mangling of the second lambda changed (previously we counted all
lambdas, now we only count lambdas with the same signature), we
generate_mangling_alias for handler<lambda2> for backward compatibility.
Since handler is COMDAT, resolve_alias puts the alias in the same comdat
group as handler itself. Then create_dispatcher_calls tries to add the
alias to the same comdat group as the dispatcher, but it's already in a
same_comdat_group, so we ICE.
It seems like we're just missing a remove_from_same_comdat_group before
add_to_same_comdat_group.
PR c++/114992
gcc/ChangeLog:
* multiple_target.cc (create_dispatcher_calls):
remove_from_same_comdat_group before add_to_same_comdat_group.
Filip Kastl [Thu, 20 Mar 2025 10:54:59 +0000 (11:54 +0100)]
gimple: sccopy: Don't increment i after vec::unordered_remove()
I increment the index variable in a loop even when I do
vec::unordered_remove() which causes the vector traversal to miss some
elements. Mikael notified me of this mistake I made in my last patch.
gcc/ChangeLog:
* gimple-ssa-sccopy.cc (scc_copy_prop::propagate): Don't
increment after vec::unordered_remove().
Simon Martin [Thu, 20 Mar 2025 19:36:26 +0000 (20:36 +0100)]
c++: Don't prune constant capture proxies only used in array dimensions [PR114292]
We currently ICE upon the following valid (under -Wno-vla) code
=== cut here ===
void f(int c) {
constexpr int r = 4;
[&](auto) { int t[r * c]; }(0);
}
=== cut here ===
When parsing the lambda body, and more specifically the multiplication,
we mark the lambda as LAMBDA_EXPR_CAPTURE_OPTIMIZED, which indicates to
prune_lambda_captures that it might be possible to optimize out some
captures.
The problem is that prune_lambda_captures then misses the use of the r
capture (because neither walk_tree_1 nor cp_walk_subtrees walks the
dimensions of array types - here "r * c"), hence believes the capture
can be pruned... and we trip on an assert when instantiating the lambda.
This patch changes cp_walk_subtrees so that (1) when walking a
DECL_EXPR, it also walks the DECL's type, and (2) when walking an
INTEGER_TYPE and processing a template declaration, it also walks its
TYPE_{MIN,MAX}_VALUE.
PR c++/114292
gcc/cp/ChangeLog:
* tree.cc (cp_walk_subtrees): Walk the type of DECL_EXPR
declarations, as well as the TYPE_{MIN,MAX}_VALUE of
INTEGER_TYPEs for template declarations.
Jason Merrill [Wed, 19 Mar 2025 09:15:00 +0000 (05:15 -0400)]
c++: mangling of array new [PR119316]
Because we build an array type to represent an array new, we hit a VLA
error in compute_array_index_type for a variable length array new. To avoid
this, let's build the MINUS_EXPR and index type directly.
I also noticed that the non-constant case in write_array_type was assuming
MINUS_EXPR without verifying it, so I added a checking_assert.
I also noticed that Clang doesn't mangle the length of an array new at all,
so I opened https://github.com/itanium-cxx-abi/cxx-abi/issues/199 to clarify
this.
PR c++/119316
gcc/cp/ChangeLog:
* mangle.cc (write_expression) [NEW_EXPR]: Avoid using
compute_array_index_type.
(write_array_type): Add checking_assert.
Patrick Palka [Tue, 18 Mar 2025 15:38:33 +0000 (11:38 -0400)]
c++: memfn pointer as NTTP argument considered unused [PR119233]
This is just the member function pointer version of PR c++/105848,
in which our non-dependent call pruning may cause us to not mark an
otherwise unused function pointer template argument as used.
PR c++/119233
gcc/cp/ChangeLog:
* pt.cc (mark_template_arguments_used): Also handle member
function pointers.
Eric Botcazou [Wed, 19 Mar 2025 07:55:04 +0000 (08:55 +0100)]
Fix misoptimization at -O2 in LTO mode
This is a regression in recent releases. The problem is that the IPA mod/ref
pass looks through the (nominal) type of a pointer-to-discriminated-type
parameter in a call to a subprogram in order to see the (actual) type used
for the dereferences of the parameter in the callee, which is a
pointer-to-constrained-subtype.
Historically the discriminated type is marked with the may_alias attribute
because of the symmetric effect for the argument in the caller, so we mark
the constrained subtype with the attribute now for the sake of the callee.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Record_Subtype>: Set
the may_alias attribute if a specific GCC type is built.
Eric Botcazou [Wed, 19 Mar 2025 07:22:33 +0000 (08:22 +0100)]
Fix spurious visibility error with partially parameterized formal package
This is not a regression but the issue is quite annoying and the fix is
trivial. The problem is that a formal parameter covered by a box in the
formal package is not visible in the instance when it comes after another
formal parameter that is also a formal package.
It comes from a discrepancy internal to Instantiate_Formal_Package, where
a specific construct (the abbreviated instance) built for the nested formal
package discombobulates the processing done for the outer formal package.
gcc/ada/
* gen_il-gen-gen_nodes.adb (N_Formal_Package_Declaration): Use
N_Declaration instead of Node_Kind as ancestor.
* sem_ch12.adb (Get_Formal_Entity): Remove obsolete alternative.
(Instantiate_Formal_Package): Take into account the abbreviated
instances in the main loop running over the actuals of the local
package created for the formal package.
gcc/testsuite/
* gnat.dg/generic_inst14.adb: New test.
* gnat.dg/generic_inst14_pkg.ads: New helper.
* gnat.dg/generic_inst14_pkg-child.ads: Likewise.
Jason Merrill [Tue, 18 Mar 2025 18:44:08 +0000 (14:44 -0400)]
c++: constexpr ref template arg [PR119194]
Here we were assuming that a constant variable appearing in a template
argument is used for its value. We also need to handle seeing its address
taken.
PR c++/119194
gcc/cp/ChangeLog:
* decl2.cc (min_vis_expr_r) [ADDR_EXPR]: New case.
Marek Polacek [Mon, 17 Mar 2025 21:46:02 +0000 (17:46 -0400)]
c++: ICE with ptr-to-member-fn [PR119344]
This ICE appeared with the removal of NON_DEPENDENT_EXPR. Previously
skip_simple_arithmetic would get NON_DEPENDENT_EXPR<CAST_EXPR<>> and
since NON_DEPENDENT_EXPR is neither BINARY_CLASS_P nor UNARY_CLASS_P,
there was no problem. But now we pass just CAST_EXPR<> and a CAST_EXPR
is a tcc_unary, so we extract its null operand and crash.
skip_simple_arithmetic is called from save_expr. cp_save_expr already
avoids calling save_expr in a template, so that seems like an appropriate
way to fix this.
PR c++/119344
gcc/cp/ChangeLog:
* typeck.cc (cp_build_binary_op): Use cp_save_expr instead of save_expr.
gcc/testsuite/ChangeLog:
* g++.dg/conversion/ptrmem10.C: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 6fc1f70f0b7b50fd85aa58a0f29dd1e17f2113d1)
Marek Polacek [Mon, 17 Mar 2025 16:56:40 +0000 (12:56 -0400)]
c++: ICE when substituting packs into type aliases [PR118104]
r12-1094 mentions that adding the assert didn't lead to any regressions
in the testsuite, but this test case demonstrates that we can reach it
with valid code.
Here we arrive in use_pack_expansion_extra_args_p with t which is an
expansion whose pattern is void(Ts, Us) and tparm packs are {Us, Ts},
and parm_packs is { Ts -> <int, int>, Us -> <A, P...> }. We want to
expand the pack into void(int, A) and void(int, P...). We compare
int to A, which is fine, but then int to P... which crashes. But
the code is valid so this patch removes the assert.
PR c++/118104
gcc/cp/ChangeLog:
* pt.cc (use_pack_expansion_extra_args_p): Remove an assert.
Patrick Palka [Fri, 28 Feb 2025 14:39:57 +0000 (09:39 -0500)]
libstdc++: Fix constraint recursion in basic_const_iterator relops [PR112490]
Here for
using RCI = reverse_iterator<basic_const_iterator<vector<int>::iterator>>
static_assert(std::totally_ordered<RCI>);
we effectively need to check the requirement
requires (RCI x) { x RELOP x; } for each RELOP in {<, >, <=, >=}
which we expect to be straightforwardly satisfied by reverse_iterator's
namespace-scope relops. But due to ADL we find ourselves also
considering the basic_const_iterator relop friends, which before CWG
2369 would be quickly discarded since RCI clearly isn't convertible to
basic_const_iterator. After CWG 2369 though we must first check these
relops' constraints (with _It = vector<int>::iterator and _It2 = RCI),
which entails checking totally_ordered<RCI> recursively.
This patch fixes this by turning the problematic non-dependent function
parameters of type basic_const_iterator<_It> into dependent ones of
type basic_const_iterator<_It3> where _It3 is constrained to match _It.
Thus the basic_const_iterator relop friends now get quickly discarded
during deduction and before the constraint check if the second operand
isn't a specialization of basic_const_iterator (or derived from one)
like before CWG 2369.
PR libstdc++/112490
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h (basic_const_iterator::operator<):
Replace non-dependent basic_const_iterator function parameter with
a dependent one of type basic_const_iterator<_It3> where _It3
matches _It.
(basic_const_iterator::operator>): Likewise.
(basic_const_iterator::operator<=): Likewise.
(basic_const_iterator::operator>=): Likewise.
* testsuite/24_iterators/const_iterator/112490.cc: New test.