Jørgen Kvalsvik [Sat, 7 Mar 2026 09:36:40 +0000 (10:36 +0100)]
Improve speed of masking table algorithm for MC/DC
The masking table was computed by considering the cartesian product of
incoming edges, ordering the pairs, and doing upwards BFS searches
from the sucessors of the lower topologically index'd ones (higher in
the graph). The problem with this approach is that all the nodes we
find from the higher candidates would also be found from the lower
candidates, and since we want to collect the set intersection, any
higher candidate would be dominated by lower candidates.
We need only consider adjacent elements in the sorted set of
candidates. This has a dramatic performance impact for large
functions. The worst case is expressions on the form (x && y && ...)
and (x || y || ...) with up-to 64 elements. I did a wallclock
comparison of the full analysis phase (including emitting the GIMPLE):
test.c:
int fn (int a[])
{
(a[0] && a[1] && ...) // 64 times
(a[0] && a[1] && ...) // 64 times
... // 500 times
}
int main ()
{
int a[64];
for (int i = 0; i != 10000; ++i)
{
for (int k = 0; k != 64; ++k)
a[k] = i % k;
fn1 (a);
}
}
Without this patch:
fn1 instrumented in 20822.303 ms (41.645 ms per expression)
With this patch:
fn1 instrumented in 1288.548 ms (2.577 ms per expression)
I also tried considering terms left-to-right and, whenever the search
found an already-processed expression it would stop the search and
just insert its complete table entry, but this had no measurable
impact on compile time, and the result was a slightly more complicated
function.
This inefficiency went unnoticed for a while, because these
expressions aren't very common. The most I've seen in the wild is 27
conditions, and that involved a lot of nested expressions which aren't
impacted as much.
gcc/ChangeLog:
* tree-profile.cc (struct conds_ctx): Add edges.
(topological_src_cmp): New function.
(masking_vectors): New search strategy.
We find (const_int 3 [0x3]) and a few others to be equivalent, among
them (reg:QI v1). This is a "fake set" that we create to help CSE extract
const_vector elements and reuse them. Element 0 is special, though.
We lowpart-subreg simplify it to (reg:QI v1) directly and, as the register
stays the same, consider it equivalent to (reg:V8QI v1).
Because both equivs refer to the same hard reg, in merge_equiv_classes, the
old (reg:V8QI) equiv is deleted and replaced by the new (reg:QI) one,
forgetting that the old equiv had 7 more elements.
Subsequently, extracting element 1 of a zero-extended QImode register results
in "0" instead of the correct "-4".
Therefore, this patch only uses those vec_select simplification that do
not directly result in a register.
PR rtl-optimization/121649
gcc/ChangeLog:
* cse.cc (find_sets_in_insn): Only use non-reg vec_select
simplifications.
Martin Uecker [Thu, 19 Feb 2026 17:20:01 +0000 (18:20 +0100)]
c: Fix ICE related to tags and hardbool attribute [PR123856]
The hardbool attribute creates special enumeration types,
but the tag is not set correctly, which causes broken diagnostics
and an ICE with the new helper function to get the tag.
David Malcolm [Fri, 6 Mar 2026 23:47:05 +0000 (18:47 -0500)]
testsuite: fix ICEs in analyzer plugin with CPython >= 3.11 [PR107646,PR112520]
In GCC 14 the testsuite gained a plugin that "teaches" the analyzer
about the CPython API, trying for find common mistakes:
https://gcc.gnu.org/wiki/StaticAnalyzer/CPython
Unfortunately, this has been crashing for more recent versions of
CPython.
Specifically, in Python 3.11, PyObject's ob_refcnt was moved to an
anonymous union (as part of PEP 683 "Immortal Objects, Using a Fixed
Refcount"). The plugin attempts to find the field but fails, but has
no error-handling, leading to a null pointer dereference.
Also, https://github.com/python/cpython/pull/101292 moved the "ob_digit"
from struct _longobject to a new field long_value of a new
struct _PyLongValue, leading to similar analyzer crashes when not
finding the field.
The following patch fixes this by
* looking within the anonymous union for the ob_refcnt field if it can't
find it directly
* gracefully handling the case of not finding "ob_digit" in PyLongObject
* doing more lookups once at plugin startup, rather than continuously on
analyzing API calls
* adding diagnostics and more error-handling to the plugin startup, so that
if it can't find something in the Python headers it emits a useful note
when disabling itself, e.g.
cc1: note: could not find field 'ob_digit' of CPython type 'PyLongObject' {aka 'struct _longobject'}
* replacing some copy-and-pasted code with member functions of a new
"class api" (though various other cleanups could be done)
Tested with:
* CPython 3.8: all tests continue to PASS
* CPython 3.13: fixes the ICEs, 2 FAILs remain (reference counting false
negatives)
Given that this is already a large patch, I'm opting to only fix the
crashes and defer the 2 remainings FAILs and other cleanups to followup
work.
gcc/analyzer/ChangeLog:
PR testsuite/112520
* region-model-manager.cc
(region_model_manager::get_field_region): Assert that the args are non-null.
gcc/testsuite/ChangeLog:
PR analyzer/107646
PR testsuite/112520
* gcc.dg/plugin/analyzer_cpython_plugin.cc: Move everything from
namespace ana:: into ana::cpython_plugin. Move global tree values
into a new "class api".
(pyobj_record): Replace with api.m_type_PyObject.
(pyobj_ptr_tree): Replace with api.m_type_PyObject_ptr.
(pyobj_ptr_ptr): Replace with api.m_type_PyObject_ptr_ptr.
(varobj_record): Replace with api.m_type_PyVarObject.
(pylistobj_record): Replace with api.m_type_PyListObject.
(pylongobj_record): Replace with api.m_type_PyLongObject.
(pylongtype_vardecl): Replace with api.m_vardecl_PyLong_Type.
(pylisttype_vardecl): Replace with api.m_vardecl_PyList_Type.
(get_field_by_name): Add "complain" param and use it to issue a
note on failure. Assert that type and name are non-null. Don't
crash on fields that are anonymous unions, and special-case
looking within them for "ob_refcnt" to work around the
Python 3.11 change for PEP 683 (immortal objects).
(get_sizeof_pyobjptr): Convert to...
(api::get_sval_sizeof_PyObject_ptr): ...this
(init_ob_refcnt_field): Convert to...
(api::init_ob_refcnt_field): ...this.
(set_ob_type_field): Convert to...
(api::set_ob_type_field): ..this.
(api::init_PyObject_HEAD): New.
(api::get_region_PyObject_ob_refcnt): New.
(api::do_Py_INCREF): New.
(api::get_region_PyVarObject_ob_size): New.
(api::get_region_PyLongObject_ob_digit): New.
(inc_field_val): Convert to...
(api::inc_field_val): ...this.
(refcnt_mismatch::refcnt_mismatch): Add tree params for refcounts
and initialize corresponding fields. Fix whitespace.
(refcnt_mismatch::emit): Use stored tree values, rather than
assuming we have constants, and crashing non-constants. Delete
commented-out dead code.
(refcnt_mismatch::foo): Delete.
(refcnt_mismatch::m_expected_refcnt_tree): New field.
(refcnt_mismatch::m_actual_refcnt_tree): New field.
(retrieve_ob_refcnt_sval): Simplify using class api.
(count_pyobj_references): Likewise.
(check_refcnt): Likewise. Don't warn on UNKNOWN values. Use
get_representative_tree for the expected and actual values and
skip the warning if it fails, rather than assuming we have
constants and crashing on non-constants.
(count_all_references): Update comment.
(kf_PyList_Append::impl_call_pre): Simplify using class api.
(kf_PyList_Append::impl_call_post): Likewise.
(kf_PyList_New::impl_call_post): Likewise.
(kf_PyLong_FromLong::impl_call_post): Likewise.
(get_stashed_type_by_name): Emit note if the type couldn't be
found.
(get_stashed_global_var_by_name): Likewise for globals.
(init_py_structs): Convert to...
(api::init_from_stashed_types): ...this. Bail out with an error
code if anything fails. Look up more things at startup, rather
than during analysis of calls.
(ana::cpython_analyzer_events_subscriber): Rename to...
(ana::cpython_plugin::analyzer_events_subscriber): ...this.
(analyzer_events_subscriber::analyzer_events_subscriber):
Initialize m_init_failed.
(analyzer_events_subscriber::on_message<on_tu_finished>):
Update for conversion of init_py_structs to
api::init_from_stashed_types and bail if it fails.
(analyzer_events_subscriber::on_message<on_frame_popped): Don't
run if plugin initialization failed.
(analyzer_events_subscriber::m_init_failed): New field.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Patrick Palka [Fri, 6 Mar 2026 22:59:11 +0000 (17:59 -0500)]
c++: ICE mangling C auto... tparm [PR124297]
After r16-7491, the constraint on a C auto... tparm is represented as a
fold-expression (in TEMPLATE_PARM_CONSTRAINTS) instead of a concept-id (in
PLACEHOLDER_TYPE_CONSTRAINTS). So we now need to strip this fold-expression
before calling write_type_constraint, like we do in the type template
parameter case a few lines below.
PR c++/124297
gcc/cp/ChangeLog:
* mangle.cc (write_template_param_decl) <case PARM_DECL>:
Strip fold-expression before calling write_type_constraint.
Andrew Pinski [Tue, 17 Feb 2026 22:03:44 +0000 (14:03 -0800)]
aarch64: Fix uint64_t[8] usage after including "arm_neon.h" [PR124126]
aarch64_init_ls64_builtins_types currently creates an array with type uint64_t[8]
and then sets the mode to V8DI. The problem here is if you used that array
type before, you would get a mode of BLK.
This causes an ICE in some cases, with the C++ front-end with -g, you would
get "type variant differs by TYPE_MODE" and in some cases even without -g,
"canonical types differ for identical types".
The fix is to do build_distinct_type_copy of the array in aarch64_init_ls64_builtins_types
before assigning the mode to that copy. We keep the same ls64 structures correct and
user provided arrays are not influenced when "arm_neon.h" is included.
Build and tested on aarch64-linux-gnu.
PR target/124126
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types): Copy
the array type before setting the mode.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/pr124126-1.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
the IR for p->c[12] is:
(.ACCESS_WITH_SIZE (p->c, &p->b, 0B, 4) + 48) = 2;
The current routine get_index_from_offset in c-family/c-ubsan.cc cannot
handle the integer constant offset "48" correctly.
The fix is to enhance "get_index_from_offset" to correctly handle the constant
offset.
PR c/124230
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_index_from_offset): Handle the special case when
the offset is an integer constant.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-char.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-float.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-struct.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-union.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230.c: New test.
Andrew Pinski [Fri, 6 Mar 2026 19:22:56 +0000 (11:22 -0800)]
c: Fix pragma inside a pragma [PR97991}
After r0-72806-gbc4071dd66fd4d, c_parser_consume_token will
assert if we get a pragma inside c_parser_consume_token but
pragma processing will call pragma_lex which then calls
c_parser_consume_token. In the case of pragma with expansion
(redefine_extname, message and sometimes pack [and some target
specific pragmas]) we get the expanded tokens that includes
CPP_PRAGMA. We should just allow it instead of doing an assert.
This follows what the C++ front-end does even and we no longer
have an ICE.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/97991
gcc/c/ChangeLog:
* c-parser.cc (c_parser_consume_token): Allow
CPP_PRAGMA if inside a pragma.
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/pr97991-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Saurabh Jha [Mon, 16 Feb 2026 14:11:58 +0000 (14:11 +0000)]
aarch64: mingw: Fix regression in C++ support
Fixes regression in C++ support without exception handling by:
1. Moving Makefile fragment config/i386/t-seh-eh to
config/mingw/t-seh-eh that handles C++ exception handling. This is
sufficient to fix the regression even if the exception handling
itself is not implemented yet.
2. Changing existing references of t-seh-eh in libgcc/config.host and
add it for aarch64-*-mingw*.
With these changes, the compiler can now be built with C and C++.
This doesn't add support for Structured Exception Handling (SEH)
which will be done separately.
libgcc/ChangeLog:
* config.host: Set tmake_eh_file for aarch64-*-mingw* and update
it for x86_64-*-mingw* and x86_64-*-cygwin*.
* config/i386/t-seh-eh: Move to...
* config/mingw/t-seh-eh: ...here.
* config/aarch64/t-no-eh: Removed.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mingw/mingw.exp: Add support for C++ files.
* gcc.target/aarch64/mingw/minimal_new_del.C: New test.
Jakub Jelinek [Fri, 6 Mar 2026 13:33:19 +0000 (14:33 +0100)]
testsuite: Add testcase for already fixed PR [PR122000]
This testcase started to be miscompiled with my r15-9131 change
on arm with -march=armv7-a -mfpu=vfpv4 -mfloat-abi=hard -O and got
fixed with r16-6548 PR121773 change.
2026-03-06 Jakub Jelinek <jakub@redhat.com>
PR target/122000
* gcc.c-torture/execute/pr122000.c: New test.
Nathan Myers [Fri, 6 Mar 2026 10:33:04 +0000 (05:33 -0500)]
libstdc++: bitset _GLIBCXX_ASSERTIONS op[] fixes
C++11 forbids a compound statement, as seen in the definition
of __glibcxx_assert(), in a constexpr function. This patch
open-codes the assertion in `bitset<>::operator[] const` for
C++11 to fix a failure in `g++.old-deja/g++.martin/bitset1.C`.
Also, it adds `{ dg-do compile }` in another test to suppress
a spurious UNRESOLVED complaint.
Code size tests on Arm are notoriously flaky because there are
numerous ISA variants (Arm, Thumb-1 and Thumb-2) to consider in
addition to a number of other variants from multiple sub-architecture
and micro-architectural tuning options. In combination this means
that we have continuous testsuite churn if the constraints are tight
enough to detect real regressions.
So this patch eliminates most of these checks, except where the code
size test is the only test that is done (other than the compilation
itself). Where that is the case I've tightened the compiler options
to limit the test to one set of architecture flags, thereby
eliminating most of the sources of variation.
In some cases I've replaced a code-size check with some other test of
the output, based on the intent of the original patch that motivated
the test. For example, the max-insns-skipped test now checks that an
IT instruction is not generated rather than checking the size of the
binary (which was a side-effect of not generating IT).
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add arm_arch_v7a_thumb.
* gcc.target/arm/ifcvt-size-check.c: Add options to force thumb1.
* gcc.target/arm/ivopts-2.c: Remove object size check.
* gcc.target/arm/ivopts-3.c: Likewise.
* gcc.target/arm/ivopts-4.c: Likewise.
* gcc.target/arm/ivopts-5.c: Likewise.
* gcc.target/arm/ivopts.c: Likewise.
* gcc.target/arm/max-insns-skipped.c: Scan for absence of an IT
instruction. Remove object size check. Use arm_arch_v7a_thumb.
* gcc.target/arm/pr43597.c: Remove object size check and use
arm_arch_v7a_thumb.
* gcc.target/arm/pr63210.c: Use arm_arch_v5t_thumb options.
* gcc.target/arm/split-live-ranges-for-shrink-wrap.c: Remove
object size check and use arm_arch_v5t_thumb options.
arm: testsuite: Fix typo on target arm_cpu_cortex_a53
When testing the effective target these tests were using the wrong
name since they omitted the trailing _ok. This was causing some tests
to fail to execute correclty.
gcc/testsuite/ChangeLog:
* gcc.target/arm/aes-fuse-1.c: Add _ok to the effective_target.
* gcc.target/arm/aes-fuse-2.c: Likewise.
Jonathan Wakely [Wed, 4 Mar 2026 10:54:16 +0000 (10:54 +0000)]
libstdc++: Use aligned new for filesystem::path internals [PR122300]
As Bug 122300 shows, we have at least one target where the
static_assert added by r16-4422-g1b18a9e53960f3 fails. This patch
resurrects the original proposal for using aligned new that I posted in
https://gcc.gnu.org/pipermail/libstdc++/2025-October/063904.html
Instead of just asserting that the memory from operator new will be
sufficiently aligned, check whether it will be and use aligned new if
needed. We don't just use aligned new unconditionally, because that can
add overhead on targets where malloc already meets the requirements.
libstdc++-v3/ChangeLog:
PR libstdc++/122300
* src/c++17/fs_path.cc (path::_List::_Impl): Remove
static_asserts.
(path::_List::_Impl::required_alignment)
(path::_List::_Impl::use_aligned_new): New static data members.
(path::_List::_Impl::create_unchecked): Check use_aligned_new
and use aligned new if needed.
(path::_List::_Impl::alloc_size): New static member function.
(path::_List::_Impl_deleter::operator): Check use_aligned_new
and use aligned delete if needed.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Fri, 6 Mar 2026 09:32:00 +0000 (10:32 +0100)]
tree-inline: Fix up ICE on !is_gimple_reg is_gimple_reg_type copying [PR124135]
The first testcase below ICEs e.g. with -O2 on s390x-linux, the
second with -O2 -m32 on x86_64-linux. We have
<bb 2> [local count: 1073741824]:
if (x_4(D) != 0)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 4> [local count: 1073741824]:
return <retval>;
on a target where <retval> has gimple reg type but is
aggregate_value_p and TREE_ADDRESSABLE too.
fnsplit splits this into
<bb 2> [local count: 354334800]:
_1 = qux (42);
foo (0, &<retval>, _1);
<bb 3> [local count: 354334800]:
return <retval>;
in the *.part.0 function and
if (x_4(D) != 0)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 4> [local count: 1073741824]:
return <retval>;
in the original function. Now, dunno if already that isn't
invalid because <retval> has TREE_ADDRESSABLE set in the latter, but
at least it is accepted by tree-cfg.cc verification.
tree lhs = gimple_call_lhs (stmt);
if (lhs
&& (!is_gimple_reg (lhs)
&& (!is_gimple_lvalue (lhs)
|| verify_types_in_gimple_reference
(TREE_CODE (lhs) == WITH_SIZE_EXPR
? TREE_OPERAND (lhs, 0) : lhs, true))))
{
error ("invalid LHS in gimple call");
return true;
}
While lhs is not is_gimple_reg, it is is_gimple_lvalue here.
Now, inlining of the *.part.0 fn back into the original results
in
<retval> = a;
statement which already is diagnosed by verify_gimple_assign_single:
case VAR_DECL:
case PARM_DECL:
if (!is_gimple_reg (lhs)
&& !is_gimple_reg (rhs1)
&& is_gimple_reg_type (TREE_TYPE (lhs)))
{
error ("invalid RHS for gimple memory store: %qs", code_name);
debug_generic_stmt (lhs);
debug_generic_stmt (rhs1);
return true;
}
__float128/long double are is_gimple_reg_type, but both operands
aren't is_gimple_reg.
The following patch fixes it by doing separate load and store, i.e.
_42 = a;
<retval> = 42;
in this case. If we want to change verify_gimple_assign to disallow
!is_gimple_reg (lhs) for is_gimple_reg_type (TREE_TYPE (lhs)), we'd
need to change fnsplit instead, but I'd be afraid such a change would
be more stage1 material (and certainly nothing that should be
even backported to release branches).
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124135
* tree-inline.cc (expand_call_inline): If both gimple_call_lhs (stmt)
and use_retvar aren't gimple regs but have gimple reg type, use
separate load of use_retva into SSA_NAME and then store of it
into gimple_call_lhs (stmt).
* g++.dg/torture/pr124135-1.C: New test.
* g++.dg/torture/pr124135-2.C: New test.
Jakub Jelinek [Fri, 6 Mar 2026 07:14:09 +0000 (08:14 +0100)]
match.pd: Move cast into p+ operand for (ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z)) [PR124358]
The following testcase is miscompiled since my r12-6382 change, because
it doesn't play well with the gimple_fold_indirect_ref function which uses
STRIP_NOPS and then has
/* *(foo *)fooarrptr => (*fooarrptr)[0] */
if (TREE_CODE (TREE_TYPE (subtype)) == ARRAY_TYPE
&& TREE_CODE (TYPE_SIZE (TREE_TYPE (TREE_TYPE (subtype)))) == INTEGER_CST
&& useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (subtype))))
{
tree type_domain;
tree min_val = size_zero_node;
tree osub = sub;
sub = gimple_fold_indirect_ref (sub);
if (! sub)
sub = build1 (INDIRECT_REF, TREE_TYPE (subtype), osub);
type_domain = TYPE_DOMAIN (TREE_TYPE (sub));
if (type_domain && TYPE_MIN_VALUE (type_domain))
min_val = TYPE_MIN_VALUE (type_domain);
if (TREE_CODE (min_val) == INTEGER_CST)
return build4 (ARRAY_REF, type, sub, min_val, NULL_TREE, NULL_TREE);
}
Without the GENERIC
#if GENERIC
(simplify
(pointer_plus (convert:s (pointer_plus:s @0 @1)) @3)
(convert:type (pointer_plus @0 (plus @1 @3))))
#endif
we have INDIRECT_REF of POINTER_PLUS_EXPR with int * type of NOP_EXPR
to that type of POINTER_PLUS_EXPR with pointer to int[4] ARRAY_TYPE, so
gimple_fold_indirect_ref doesn't create the ARRAY_REF.
But with it, it is simplified to NOP_EXPR to int * type from
POINTER_PLUS_EXPR with pointer to int[4] ARRAY_TYPE, the NOP_EXPR is
skipped over by STRIP_NOPS and the above code triggers.
The following patch fixes it by swapping the order, do NOP_EXPR
inside of POINTER_PLUS_EXPR first argument instead of NOP_EXPR with
POINTER_PLUS_EXPR.
2026-03-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124358
* match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): Simplify
into (ptr) x p+ (y + z) instead.
Andrew Pinski [Fri, 6 Mar 2026 05:54:44 +0000 (21:54 -0800)]
testsuite/aarch64: Add testcae for already fixed bug [PR124078]
This big-endian testcase started to ICE with r16-7464-g560766f6e239a8
and then started to work r16-7506-g498983d9619351.
So it seems like a good idea to add the testcase for this
so it does not break again.
Pushed as obvious after a quick test to make sure it ICEd
before and it is passing now on aarch64-linux-gnu.
PR rtl-optimization/124078
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr124078-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jakub Jelinek [Thu, 5 Mar 2026 20:43:55 +0000 (21:43 +0100)]
c++: Avoid caching TARGET_EXPR slot value if exception is thrown from TARGET_EXPR_INITIAL [PR124145]
The following testcase is miscompiled, we throw exception only during
the first bar () call and not during the second and in that case reach
the inline asm.
The problem is that the TARGET_EXPR handling calls
ctx->global->put_value (new_ctx.object, new_ctx.ctor);
first for aggregate/vectors, then
if (is_complex)
/* In case no initialization actually happens, clear out any
void_node from a previous evaluation. */
ctx->global->put_value (slot, NULL_TREE);
and then recurses on TARGET_EXPR_INITIAL.
Even for is_complex it can actually store partially the result in the
slot before throwing.
When TARGET_EXPR_INITIAL doesn't throw, we do
if (ctx->save_expr)
ctx->save_expr->safe_push (slot);
and that arranges for the value in slot be invalidated at the end of
surrounding CLEANUP_POINT_EXPR.
But in case when it does throw this isn't done.
The following patch fixes it by moving that push to save_expr
before the if (*jump_target) return NULL_TREE; check.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/124145
* constexpr.cc (cxx_eval_constant_expression) <case TARGET_EXPR>: Move
ctx->save_expr->safe_push (slot) call before if (*jump_target) test.
Use TARGET_EXPR_INITIAL instead of TREE_OPERAND.
a68: fix wrapping C functions returning void [PR algol68/124322]
This patch fixes a68_wrap_formal_proc_hole so it doesn't assume that
wrapped C functions returning void return Algol 68 void values, which
are empty records.
Tested in i686-linux-gnu and x86_64-linux-gnu.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
Alice Carlotti [Wed, 4 Mar 2026 14:58:21 +0000 (14:58 +0000)]
aarch64 libgcc: Fix mingw build [PR124333]
Make __aarch64_cpu_features unconditionally available. This permits the
unconditional use of this global inside __arm_get_current_vg, which was
introduced in r16-7637-g41b4a73f370116.
For now this global is only initialised when <sys/auxv.h> is available,
but we can extend this in future to support other ways of initialising
the bits used for SME support, and use this remove __aarch64_have_sme.
This approach was recently adopted by LLVM.
This patch does introduce an inconsistency with __aarch64_have_sme when
<sys/auxv.h> is unavailable. However, this doesn't introduce any
regressions, because one of the following conditions will hold:
1. SVE is enabled at compile time whenever we use a streaming or
streaming compatible function. In this case the compiler won't need to
use __arm_get_current_vg, so it doesn't matter if it gives the wrong
answer.
2. There is a use of a streaming or streaming compatible function when
we don't know whether SVE is enabled. In order to get correct DWARF
unwind information, we then have to be able to test for SVE availability
at runtime. This isn't possible until a working __arm_get_current_vg
implementation is available, so the configuration has never (yet) been
supported.
vect: fix vectorization of non-gather elementwise loads [PR124037]
For the vectorization of non-contiguous memory accesses such as the
vectorization of loads from a particular struct member, specifically
when vectorizing with unknown bounds (thus using a pointer and not an
array) it is observed that inadequate alignment checking allows for
the crossing of a page boundary within a single vectorized loop
iteration. This leads to potential segmentation faults in the
resulting binaries.
without any proper address alignment checks on the starting address
or on whether alignment is preserved across iterations. We therefore
fix the handling of such cases.
To correct this, we modify the logic in `get_load_store_type',
particularly the logic responsible for ensuring we don't read more
than the scalar code would in the context of early breaks, extending
it from handling not only gather-scatter and strided SLP accesses but
also allowing it to properly handle element-wise accesses, wherein we
specify that these need correct block alignment, thus promoting their
`alignment_support_scheme' from `dr_unaligned_supported' to
`dr_aligned'.
gcc/ChangeLog:
PR tree-optimization/124037
* tree-vect-stmts.cc (get_load_store_type): Fix
alignment_support_scheme categorization for early
break VMAT_ELEMENTWISE accesses.
The following fixes a regression introduced by r11-5542 which
restricts replacing uses of live original defs of now vectorized
stmts to when that does not require new loop-closed PHIs to be
inserted. That restriction keeps the original scalar definition
live which is sub-optimal and also not reflected in costing.
The particular case the following fixes can be seen in
gcc.dg/vect/bb-slp-57.c is the case where we are replacing an
existing loop closed PHI argument.
PR tree-optimization/98064
* tree-vect-loop.cc (vectorizable_live_operation): Do
not restrict replacing uses in a LC PHI.
* gcc.dg/vect/bb-slp-57.c: Verify we do not keep original
stmts live.
Jakub Jelinek [Thu, 5 Mar 2026 12:11:39 +0000 (13:11 +0100)]
libiberty: Copy over .ARM.attributes section into *.debug.temp.o files [PR124365]
If gcc is configured on aarch64-linux against new binutils, such as
2.46, it doesn't emit into assembly markings like
.section .note.gnu.property,"a"
.align 3
.word 4
.word 16
.word 5
.string "GNU"
.word 0xc0000000
.word 4
.word 0x7
.align 3
but instead emits
.aeabi_subsection aeabi_feature_and_bits, optional, ULEB128
.aeabi_attribute Tag_Feature_BTI, 1
.aeabi_attribute Tag_Feature_PAC, 1
.aeabi_attribute Tag_Feature_GCS, 1
The former goes into .note.gnu.propery section, the latter goes into
.ARM.attributes section.
Now, when linking without LTO or with LTO but without -g, all behaves
for the linked binaries the same, say for test.c
int main () {}
$ gcc -g -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC, GCS
$ gcc -flto -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC, GCS
$ gcc -flto -g -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
readelf: Warning: Section '.note.gnu.property' was not dumped because it does not exist
The problem is that the *.debug.temp.o object files created by lto-wrapper
don't have these markings. The function copies over .note.GNU-stack section
(so that it doesn't similarly on most arches break PT_GNU_STACK segment
flags), and .note.gnu.property (which used to hold this stuff e.g. on
aarch64 or x86, added in PR93966). But it doesn't copy the new
.ARM.attributes section.
The following patch fixes it by copying that section too. The function
unfortunately only works on names, doesn't know if it is copying ELF or some
other format (PE, Mach-O) or if it is copying ELF, whether it is EM_AARCH64
or some other arch. The following patch just copies the section always,
I think it is very unlikely people would use .ARM.attributes section for
some random unrelated stuff. If we'd want to limit it to just EM_AARCH64,
guess it would need to be done in
libiberty/simple-object-elf.c (simple_object_elf_copy_lto_debug_sections)
instead as an exception for the (*pfn) callback results (and there it could
e.g. verify SHT_AARCH64_ATTRIBUTES type but even there dunno if it has
access to the Ehdr stuff).
No testcase from me, dunno if e.g. the linker can flag the lack of those
during linking with some option rather than using readelf after link and
what kind of effective targets we'd need for such a test.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124365
* simple-object.c (handle_lto_debug_sections): Also copy over
.ARM.attributes section.
Tomasz Kamiński [Thu, 5 Mar 2026 07:57:24 +0000 (08:57 +0100)]
libstdc++: Fix atomic/cons/zero_padding.cc test for arm-none-eabi [PR124124]
The test uses dg-require-atomic-cmpxchg-word that checks if atomic compare
exchange is available for pointer sized integers, and then test types that
are eight bytes in size. This causes issue for targets for which pointers
are four byte and libatomic is not present, like arm-none-eabi.
This patch addresses by using short member in TailPadding and MidPadding,
instead of int. This reduces the size of types to four bytes, while keeping
padding bytes present.
PR libstdc++/124124
libstdc++-v3/ChangeLog:
* testsuite/29_atomics/atomic/cons/zero_padding.cc: Limit size of
test types to four bytes.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Wed, 25 Feb 2026 11:15:08 +0000 (12:15 +0100)]
libstdc++: Remove UB in _Arg_value union alternative assignment
The _Arg_value::_M_set method, initialized the union member, by
assigning to reference to that member produced by _M_get(*this).
However, per language rules, such assignment has undefined behavior,
if alternative was not already active, same as for any object not
within its lifetime.
To address above, we modify _M_set to use placement new for the class
types, and invoke _S_access with two arguments for all other types.
The _S_access (rename of _S_get) is modified to assign the value of
the second parameter (if provided) to the union member. Such direct
assignments are treated specially in the language (see N5032
[class.union.general] p5), and will start lifetime of trivially default
constructible alternative.
libstdc++-v3/ChangeLog:
* include/std/format (_Arg_value::_M_get): Rename to...
(_Arg_value::_M_access): Modified to accept optional
second parameter that is assigned to value.
(_Arg_value::_M_get): Handle rename.
(_Arg_value::_M_set): Use construct_at for basic_string_view,
handle, and two-argument _S_access for other types.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Ivan Lazaric <ivan.lazaric1@gmail.com> Co-authored-by: Ivan Lazaric <ivan.lazaric1@gmail.com>
Jakub Jelinek [Thu, 5 Mar 2026 10:23:24 +0000 (11:23 +0100)]
i386: Make -masm={att,intel} xchg operand order consistent
While in this case it is not an assemble failure nor wrong-code,
because say xchgl %eax, %edx and xchg eax, edx do the same thing,
they are encoded differently, so if we want consistency between
-masm=att and -masm=intel emitted code (my understanding is that
is what is Zdenek testing right now, fuzzing code, compiling
with both -masm=att and -masm=intel and making sure if the former
assembles, the latter does too and they result in identical
*.o files), we should use different order of the operands
even here (and it doesn't matter which order we pick).
I've grepped the *.md files with
grep '\\t%[0-9], %[0-9]' *.md | grep -v '%0, %0'
i386.md: "xchg{<imodesuffix>}\t%1, %0"
i386.md: xchg{<imodesuffix>}\t%1, %0
i386.md: "wrss<mskmodesuffix>\t%0, %1"
i386.md: "wruss<mskmodesuffix>\t%0, %1"
(before this and PR124366 fix) and later on also with
grep '\\t%[a-z0-9_<>]*[0-9], %[a-z0-9_<>]*[0-9]' *.md | grep -v '%0, %0'
and checked all the output and haven't found anything else problematic.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.md (swap<mode>): Swap operand order for
-masm=intel.
Tomasz Kamiński [Tue, 24 Feb 2026 07:08:58 +0000 (08:08 +0100)]
libstdc++: Store basic_format_arg::handle in __format::_Arg_value
This patch changes the type of _M_handle member of __format::_Arg_value
from __format::_HandleBase union member to basic_format_arg<_Context>::handle.
This allows handle to be stored (using placement new) inside _Arg_value at
compile time, as type _M_handle member now matches stored object.
In addition to above, to make handle usable at compile time, we adjust
the _M_func signature to match the stored function, avoiding the need
for reinterpret cast.
To avoid a cycling dependency, where basic_format_arg<_Context> requires
instantiating _Arg_value<_Context> for its _M_val member, that in turn
requires basic_format_arg<_Context>::handle, we define handle as nested
class inside _Arg_value and change basic_format_arg<_Context>::handle
to alias for it.
Finally, the handle(_Tp&) constructor is now constrained to not accept
handle itself, as otherwise it would be used instead of copy-constructor
when constructing from handle&.
As _Arg_value is already templated on _Context, this change should not lead
to additional template instantiations.
libstdc++-v3/ChangeLog:
* include/std/format (__Arg_value::handle): Define, extracted
with modification from basic_format_arg::handle.
(_Arg_value::_Handle_base): Remove.
(_Arg_value::_M_handle): Change type to handle.
(_Arg_value::_M_get, _Arg_value::_M_set): Check for handle
type directly, and return result unmodified.
(basic_format_arg::__formattable): Remove.
(basic_format_arg::handle): Replace with alias to
_Arg_value::handle.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
PR 123807 turns out to be a special case of the middle-end PR 124250.
The previous ad-hoc fix is unneeded now since the underlying middle-end
issue is fixed, so revert it but keep the test case.
Jakub Jelinek [Thu, 5 Mar 2026 09:05:44 +0000 (10:05 +0100)]
i386: Fix up last -masm=intel operand of vcvthf82ph [PR124349]
gas expects for this instruction
vcvthf82ph xmm30, QWORD PTR [r9]
vcvthf82ph ymm30, XMMWORD PTR [r9]
vcvthf82ph zmm30, YMMWORD PTR [r9]
i.e. the memory size is half of the dest register size.
We currently emit it for the last 2 forms but emit XMMWORD PTR
for the first one too. So, we need %q1 for V8HF and for V16HF/V32HF
can either use just %1 or %x1/%t1. There is no define_mode_attr
that would provide those, so I've added one just for this insn.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124349
* config/i386/sse.md (iptrssebvec_2): New define_mode_attr.
(cvthf82ph<mode><mask_name>): Use it for -masm=intel input
operand.
Jakub Jelinek [Thu, 5 Mar 2026 08:35:39 +0000 (09:35 +0100)]
i386: Fix operand order for @wrss<mode> and @wruss<mode> [PR124366]
These two insns were using the same operand order for both -masm=att
and -masm=intel, which is ok if using the same operand for both, but not
when they are different.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124366
* config/i386/i386.md (@wrss<mode>, @wruss<mode>): Swap operand
order for -masm=intel.
Jakub Jelinek [Thu, 5 Mar 2026 08:19:59 +0000 (09:19 +0100)]
c++: Fix up handling of unnamed types named by typedef for linkage purposes for -freflection [PR123810]
As mentioned on the PR, we ICE on the following testcase and if members_of
isn't called on a class with e.g. typedef struct { int d; } D;, we don't
handle it correctly, e.g. we say ^^C::D is not an type alias or for
members_of in a namespace that there aren't two entities, the struct itself
and the type alias for it.
This is because name_unnamed_type handles the naming of an unnamed type
through typedef for linkage purposes (where we originally have
a TYPE_DECL with IDENTIFIER_ANON_P DECL_NAME for the type) by replacing
all occurrences of TYPE_NAME on the type from the old TYPE_DECL to the new
TYPE_DECL with the user provided name.
The ICE for members_of (^^C, uctx) is then because we see two TYPE_DECLs
(one with IDENTIFIER_ANON_P, one with user name) with the same TREE_TYPE
and enter the same thing twice into what we want to return and ICE in the
comparison routine. Anyway, for is_type_alias purposes, there is no
is_typedef_decl and can't be because the same TYPE_DECL is used as TYPE_NAME
of both the type proper and its alias. Without reflection we didn't care
about the difference.
So, the following patch changes name_unnamed_type to do things differently,
but only for -freflection, because 1) I don't want to break stuff late in
stage4 2) without reflection we don't really need it and don't need to
pay the extra memory cost by having another type which is the type alias.
The change is that instead of
TYPE_DECL .anon_NN
| TREE_TYPE
v
type <----------+
| TYPE_NAME |
v |
TYPE_DECL D |
| TREE_TYPE |
+-------------+
where for class context both TYPE_DECLs are in TYPE_FIELDS and for
namespace context only the latter one is (as pushdecl ignores the
IDENTIFIER_ANON_P one) we have
TYPE_DECL D TYPE_DECL D --- DECL_ORIGINAL_TYPE
| TREE_TYPE | TREE_TYPE |
v v |
type variant_type |
^-------------------------------+
which is except for the same DECL_NAME on both TYPE_DECLs exactly what
is used for typedef struct D_ { int d; } D;
Various spots have been testing for the typedef name for linkage purposes
cases and were using tests like:
OVERLOAD_TYPE_P (TREE_TYPE (value))
&& value == TYPE_NAME (TYPE_MAIN_VARIANT (TREE_TYPE (value)))
So that this can be tested, this patch introduces a new decl_flag on
the TYPE_DECLs and marks for -freflection both of these TYPE_DECLs
(and for -fno-reflection the one without IDENTIFIER_ANON_P name).
It is easy to differentiate between the two, the first one is also
DECL_IMPLICIT_TYPEDEF_P, the latter is not (and on the other side
has DECL_ORIGINAL_TYPE non-NULL).
For name lookup in namespaces, nothing special needs to be done,
because the originally IDENTIFIER_ANON_P TYPE_DECL wasn't added
to the bindings, at block scope I had to deal with it in pop_local_binding
because it was unhappy that it got renamed. And finally for class
scopes, we need to arrange for the latter TYPE_DECL to be found, but
currently it is the second one. The patch currently skips the first one for
name lookup in fields_linear_search and arranges for count_class_fields
and member_vec_append_class_fields to also ignore the first one. Wonder if
the latter two shouldn't also ignore any other IDENTIFIER_ANON_P TYPE_FIELDS
chain decls, or do we ever perform name lookup for the anon identifiers?
Another option for fields_linear_search would be try to swap the order of
the two TYPE_DECLs in TYPE_FIELDS chain somewhere in grokfield.
Anyway, the changes result in minor emitted DWARF changes, say for
g++.dg/debug/dwarf2/typedef1.C without -freflection there is
.uleb128 0x4 # (DIE (0x46) DW_TAG_enumeration_type)
.long .LASF6 # DW_AT_name: "typedef foo<1>::type type"
.byte 0x7 # DW_AT_encoding
.byte 0x4 # DW_AT_byte_size
.long 0x70 # DW_AT_type
.byte 0x1 # DW_AT_decl_file (typedef1.C)
.byte 0x18 # DW_AT_decl_line
.byte 0x12 # DW_AT_decl_column
.long .LASF7 # DW_AT_MIPS_linkage_name: "N3fooILj1EE4typeE"
...
and no typedef, while with -freflection there is
.uleb128 0x3 # (DIE (0x3a) DW_TAG_enumeration_type)
.long .LASF5 # DW_AT_name: "type"
.byte 0x7 # DW_AT_encoding
.byte 0x4 # DW_AT_byte_size
.long 0x6c # DW_AT_type
.byte 0x1 # DW_AT_decl_file (typedef1.C)
.byte 0x18 # DW_AT_decl_line
.byte 0x12 # DW_AT_decl_column
...
.uleb128 0x5 # (DIE (0x57) DW_TAG_typedef)
.long .LASF5 # DW_AT_name: "type"
.byte 0x1 # DW_AT_decl_file (typedef1.C)
.byte 0x18 # DW_AT_decl_line
.byte 0x1d # DW_AT_decl_column
.long 0x3a # DW_AT_type
so, different DW_AT_name on the DW_TAG_enumeration_type, missing
DW_AT_MIPS_linkage_name and an extra DW_TAG_typedef. While in theory
I could work harder to hide that detail, I actually think it is a good
thing to have it the latter way because it represents more exactly
what is going on.
Another slight change is different locations in some diagnostics
on g++.dg/lto/odr-3 test (location of the unnamed struct vs. locations
of the typedef name given to it without -freflection), and a module
issue which Nathan has some WIP patch for in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123810#c11
In any case, none of those differences show up in normal testsuite runs
currently (as those tests aren't compiled with -freflection), if/when
-freflection becomes the default for -std=c++26 we can deal with the
DWARF one as well as different locations in odr-3 and for modules I was
hoping it could be handled incrementally. I'm not even sure what should
happen if one TU has struct D { int d; }; and another one has
typedef struct { int d; } D;, shall that be some kind of error? Though
right now typedef struct { int d; } D; in both results in an error too
and that definitely needs to be handled.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/123810
* cp-tree.h (TYPE_DECL_FOR_LINKAGE_PURPOSES_P): Define.
(TYPE_DECL_WAS_UNNAMED): Likewise.
(TYPE_WAS_UNNAMED): Also check TYPE_DECL_WAS_UNNAMED.
* decl.cc (start_decl): Use TYPE_DECL_FOR_LINKAGE_PURPOSES_P.
(maybe_diagnose_non_c_class_typedef_for_l): If t == type, use
DECL_SOURCE_LOCATION (orig) instead of
DECL_SOURCE_LOCATION (TYPE_NAME (t)).
(name_unnamed_type): Set TYPE_DECL_FOR_LINKAGE_PURPOSES_P
on decl. For -freflection don't change TYPE_NAME from
orig to decl, but instead change DECL_NAME (orig) to
DECL_NAME (decl) and set TYPE_DECL_FOR_LINKAGE_PURPOSES_P on
orig too.
* decl2.cc (grokfield): Use TYPE_DECL_FOR_LINKAGE_PURPOSES_P.
* name-lookup.cc (fields_linear_search): Ignore
TYPE_DECL_WAS_UNNAMED decls.
(count_class_fields): Likewise.
(member_vec_append_class_fields): Likewise.
(pop_local_binding): Likewise.
* reflect.cc (namespace_members_of): For TYPE_DECL with
TYPE_DECL_FOR_LINKAGE_PURPOSES_P set also append
reflection of strip_typedefs (m).
* class.cc (find_flexarrays): Handle TYPE_DECLs with
TYPE_DECL_WAS_UNNAMED like the ones with IDENTIFIER_ANON_P
name.
* g++.dg/reflect/members_of10.C: New test.
* g++.dg/cpp2a/typedef1.C: Expect one message on a different line.
Marek Polacek [Wed, 4 Mar 2026 22:32:14 +0000 (17:32 -0500)]
c++/reflection: fix return value of meta::extent [PR124368]
std::meta::extent returns a size_t, but eval_extent returns either
size_zero_node or size_binop(), both of which are of type sizetype,
which is not the C/C++ size_t and so we don't pass the check in
cxx_eval_outermost_constant_expr:
/* Check we are not trying to return the wrong type. */
if (!same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r)))
We should convert to size_type_node which represents the C/C++ size_t,
like for instance fold_sizeof_expr does.
PR c++/124368
gcc/cp/ChangeLog:
* reflect.cc (eval_extent): Convert the result to size_type_node.
So just need to update the testcase removing the xfail and close this
bug as fixed.
The reason why this was not fixed until r16-101-g132d01d96ea9d6 is
because the call is from main which is known to be called once and was
not a candidate for IPA-CP until then.
In fact renaming the function from main to f (and adding a `return 0`
so not invoking undefined behavior), the scan-ipa-dump works all the
way back to GCC 5.
Tested on aarch64-linux-gnu and arm-linux-gnueabihf.
Marek Polacek [Mon, 2 Mar 2026 22:12:56 +0000 (17:12 -0500)]
c++: reusing typedefs in template for [PR124229]
This is a crash on code like:
template for (constexpr auto val : define_static_array (enumerators_of (^^E)))
{
constexpr auto a = annotations_of(val)[0];
using U = [:type_of(a):];
constexpr auto m1 = extract<U>(a);
}
because the template arg to extract wasn't substituted to "info".
Once I dug deeper I realized this problem isn't tied to Reflection:
we also crash here:
template for (constexpr auto val : { 42 })
{
using U = decltype(val);
foo<U>();
}
because we emit code for foo() that still has a DECLTYPE_TYPE in it.
The problem is in tsubst and reusing typedefs. Normally, for code like
template<typename T> void foo () {
using U = T;
U u;
}
we do the DECL_FUNCTION_SCOPE_P -> retrieve_local_specialization call.
This call only happens in function templates (that are not explicit
specializations), but the "template for" above are both in non-template
functions. So we end up returning the original tree:
/* The typedef is from a non-template context. */
return t;
It seems clear that this is the wrong thing to do, and that the
DECL_FUNCTION_SCOPE_P code should happen in this scenario as well.
[temp.decls.general] tells me that "For the purpose of name lookup and
instantiation, the compound-statement of an expansion-statement is
considered a template definition." so I'm guessing that we want to
check for an expansion-statement as well. As decl_dependent_p says,
in_expansion_stmt is false when instantiating, so I'm looking for
sk_template_for.
PR c++/124229
gcc/cp/ChangeLog:
* pt.cc (in_expansion_stmt_p): New.
(tsubst): When reusing typedefs, do retrieve_local_specialization also
when in_expansion_stmt_p is true.
gcc/testsuite/ChangeLog:
* g++.dg/cpp26/expansion-stmt32.C: New test.
* g++.dg/reflect/expansion-stmt2.C: New test.
Jakub Jelinek [Wed, 4 Mar 2026 18:22:29 +0000 (19:22 +0100)]
c++: Find annotations in DECL_ATTRIBUTES (TYPE_NAME (r)) for type aliases
On Wed, Feb 25, 2026 at 08:50:40PM +0100, Jakub Jelinek wrote:
> > Sounds like the maybe_strip_typedefs is wrong, since reflection in general
> > tries to preserve aliases.
>
> Actually the maybe_strip_typedefs call is correct, that is for the type
> argument (so when it is std::meta::annotations_with_type) and the standard
> says that dealias should be used
> - https://eel.is/c++draft/meta.reflection#annotation-6.2
> But we probably shouldn't use TYPE_ATTRIBUTES but DECL_ATTRIBUTES (TYPE_NAME (r))
> if r is a type alias.
> I'll test a patch for that separately.
Here it is.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR c++/123866
* reflect.cc (eval_annotations_of): For type aliases look for
annotations in DECL_ATTRIBUTES (TYPE_NAME (r)).
Jakub Jelinek [Wed, 4 Mar 2026 16:12:29 +0000 (17:12 +0100)]
libgfortran: Fix up putenv uses in libcaf_shmem [PR124330]
I don't have access to HP/UX, but at least on other OSes and what Linux as
well as POSIX documents is that when you call putenv with some argument,
what that argument points to becomes part of the environment and when
it is changed, the environment changes. I believe ENOMEM from putenv is
about reallocating of the __environ (or similar) pointed array of pointers
(e.g. if the particular env var name isn't there already), it still
shouldn't allocate any memory for the NAME=VALUE string and just use
the user provided. So, padding address of automatic array will be UB
as soon as the scope of that var is left.
One can either malloc the buffer, or use static vars, then nothing leaks
and in the unlikely case putenv would be called twice for the same env var,
it would second time only register the same buffer.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR libfortran/124330
* caf/shmem/shared_memory.c (shared_memory_set_env): Make buffer
used by putenv static.
(shared_memory_init): Likewise.
Andrew Pinski [Tue, 3 Mar 2026 21:57:47 +0000 (13:57 -0800)]
widen mult: Fix handling of _Fract mixed with _Fract [PR119568]
The problem here is we try calling find_widening_optab_handler_and_mode
with to_mode=E_USAmode and from_mode=E_UHQmode. This causes an ICE (with checking only).
The fix is to reject the case where the mode classes are different in convert_plusminus_to_widen
before even trying to deal with the modes.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119568
gcc/ChangeLog:
* tree-ssa-math-opts.cc (convert_plusminus_to_widen): Reject different
mode classes.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Implements P2353R5 "Extending associative containers with the
remaining heterogeneous overloads". Adds overloads templated on
heterogeneous key types for several members of associative
containers, particularly insertions:
(Nothing is added to the multiset or multimap tree containers.)
All the insert*() and try_emplace() members also get a hinted
overload. The at() members get const and non-const overloads.
The new overloads enforce concept __heterogeneous_tree_key or
__heterogeneous_hash_key, as in P2077, to enforce that the
function objects provided meet requirements, and that the key
supplied is not an iterator or the native key. Insertions
implicitly construct the required key_type object from the
argument, by move where permitted.
Philipp Tomsich [Wed, 4 Mar 2026 08:49:09 +0000 (09:49 +0100)]
avoid-store-forwarding: Clear sbitmap before use [PR124351]
The forwarded_bytes sbitmap needs to be zeroed after allocation,
as sbitmaps are not implicitly initialized. This caused valgrind
warnings about conditional jumps depending on uninitialised values.
gcc/ChangeLog:
PR rtl-optimization/124351
* avoid-store-forwarding.cc (process_store_forwarding): Add
bitmap_clear after allocating forwarded_bytes.
Jakub Jelinek [Wed, 4 Mar 2026 08:38:28 +0000 (09:38 +0100)]
i386: Fix up vcvt<convertfp8_pack><mode><mask_name> for -masm=intel [PR124341]
The vcvt<convertfp8_pack><mode><mask_name> pattern uses wrong <mask_operand?>
for -masm=intel, so the testcase fails to assemble, it emits something
like {ymm1} instead of {k1}.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/124341
* config/i386/sse.md (vcvt<convertfp8_pack><mode><mask_name>): Use
<mask_operand3> rather than <mask_operand2> for -masm=intel.
Jakub Jelinek [Wed, 4 Mar 2026 08:34:33 +0000 (09:34 +0100)]
i386: Fix up printing of input operand of avx10_2_comisbf16_v8bf for -masm=intel [PR124349]
gas expects the second operand if in memory WORD PTR rather than XMMWORD PTR.
The following patch fixes it by using %w1 instead of %1, if the operand is
a register, it is printed as xmm1 in both cases.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/124349
* config/i386/sse.md (avx10_2_comisbf16_v8bf): Use %w1 instead of %1
for -masm=intel.
Richard Biener [Wed, 4 Mar 2026 08:25:27 +0000 (09:25 +0100)]
Adjust gcc.dg/vect/vect-reduc-dot-s8b.c again
A failure on sparc shows that the dump scan for dot-prod is fragile
enough. The following simply removes it given it serves no actual
purpose and adds comments in place.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Remove scan for
dot_prod pattern matching.
liuhongt [Wed, 4 Mar 2026 02:49:37 +0000 (18:49 -0800)]
Refine the testcase.
> This testcase fails with binutils 2.35:
vmovw is supported in binutils 2.38 and later, need
/* { dg-require-effective-target avx512fp16 } */ to avoid errors.
> ```
> /tmp/ccf20y5C.s:20: Error: no such instruction: `vmovw xmm0,WORD PTR .LC0[rip]'
> /tmp/ccf20y5C.s:21: Error: no such instruction: `vmovw WORD PTR [rbp-18],xmm0'
> /tmp/ccf20y5C.s:22: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:23: Error: no such instruction: `vmovw WORD PTR [rbp-20],xmm0'
> /tmp/ccf20y5C.s:24: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:25: Error: no such instruction: `vmovw WORD PTR [rbp-22],xmm0'
> /tmp/ccf20y5C.s:26: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:27: Error: no such instruction: `vmovw WORD PTR [rbp-24],xmm0'
> /tmp/ccf20y5C.s:28: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:29: Error: no such instruction: `vmovw WORD PTR [rbp-26],xmm0'
> /tmp/ccf20y5C.s:30: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> ```
>
> Thanks,
> Andrew Pinski
gcc/testsuite/ChangeLog:
PR target/124335
* gcc.target/i386/avx512fp16-pr124335.c: Require target
avx512fp16 instead of avx512bw.
H.J. Lu [Sun, 22 Feb 2026 02:32:30 +0000 (10:32 +0800)]
x86: Call ix86_access_stack_p only with symbolic constant load
ix86_access_stack_p can be quite expensive. Cache the result and call it
only if there are symbolic constant loads. This reduces the compile time
of PR target/124165 test from 202 seconds to 55 seconds.
gcc/
PR target/124165
* config/i386/i386-protos.h (symbolic_reference_mentioned_p):
Change the argument type from rtx to const_rtx.
* config/i386/i386.cc (symbolic_reference_mentioned_p): Likewise.
(ix86_access_stack_p): Add 2 auto_bitmap[] arguments. Cache
the register BB domination result.
(ix86_symbolic_const_load_p_1): New.
(ix86_symbolic_const_load_p): Likewise.
(ix86_find_max_used_stack_alignment): If there is no symbolic
constant load into the register, don't call ix86_access_stack_p.
[PR115042, LRA]: Postpone processing of new reload insns, 2nd variant
This is the second attempt to solve the PR. The first attempt (see
commit 9a7da540b63e7d77e747b5cdd6fdbbd3954e28c8) resulted in numerous
test suite failures on some secondary targets.
LRA in this PR can not find regs for asm insn which requires 11
general regs when 13 regs are available. Arm subtarget (thumb) has
two stores with low and high general regs. LRA systematically chooses
stores involving low regs as having less costs and there are only 8
low regs. That is because LRA (and reload) chooses (mov) insn
alternatives independently from register pressure.
The proposed patch postpones processing new reload insns until the
reload pseudos are assigned and after that considers new reload insns.
We postpone reloads only for asm insns as they can have a lot of
operands. Depending on the assignment LRA chooses insns involving low
or high regs. Generally speaking it can change code generation in
better or worse way but it should be a very rare case.
The patch does not contain the test as original test is too big (300KB
of C code). Unfortunately cvise after 2 days of work managed to
decrease the test only to 100KB file.
gcc/ChangeLog:
PR target/115042
* lra-int.h (lra_postponed_insns): New.
* lra.cc (lra_set_insn_deleted, lra_asm_insn_error): Clear
postponed insn flag.
(lra_process_new_insns): Propagate postponed insn flag for asm
gotos.
(lra_postponed_insns): New.
(lra): Initialize lra_postponed_insns. Push postponed insns on
the stack.
* lra-constraints.cc (postpone_insns): New function.
(curr_insn_transform): Use it to postpone processing reload insn
constraints. Skip processing postponed insns.
Mark Wielaard [Tue, 3 Mar 2026 19:34:58 +0000 (20:34 +0100)]
libgfortran: Regenerate config.h.in and configure
commit e13b14030a30 ("Fortran: Fix libfortran cannot be cross compiled
[PR124286]") updated configure.ac but didn't regenerate config.h.in
with autoheader. Also some line numbers were still wrong in
configure. Fix this by explicitly regenerating both files with
autoheader and autoconf version 2.69.
PR libstdc++/122217
* testsuite/27_io/filesystem/operations/copy_symlink/1.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/2.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/3.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/4.cc: New
test.
Jerry DeLisle [Tue, 3 Mar 2026 04:02:58 +0000 (20:02 -0800)]
Fortran: Fix failures on windows and hpux systems [PR124330]
Co-authored-by: John David Anglin <danglin@gcc.gnu.org>
PR fortran/124330
libgfortran/ChangeLog:
* caf/shmem/shared_memory.c: Fix filenames for WIN32
includes.
(shared_memory_set_env): Use putenv() for HPUX and as
a fallback where setenv () is not available.
(NAME_MAX): Replace with SHM_NAME_MAX.
(SHM_NAME_MAX): Use this to avoid duplicating NAME_MAX
used elsewhere.
* caf/shmem/supervisor.c (get_image_num_from_envvar): Add
a fallback for HPUX. Add additional comment to explain why
the number of cores is used in lieu of GFORTRAN_NUM_IMAGES.
Martin Uecker [Fri, 20 Feb 2026 16:19:10 +0000 (17:19 +0100)]
c: Fix wrong code related to TBAA for components of structure types 2/2 [PR122572]
Given the following two types, the C FE assigns the same
TYPE_CANONICAL to both struct bar, because it treats pointer to
tagged types with the same type as compatible (in this context).
struct foo { int y; };
struct bar { struct foo *c; }
struct foo { long y; };
struct bar { struct foo *c; }
get_alias_set records the components of aggregate types, but only
considers the components of the canonical version. To prevent
miscompilation, we create a modified canonical type where we
change such pointers to void pointers.
PR c/122572
gcc/c/ChangeLog:
* c-decl.cc (finish_struct): Add distinct canonical type.
* c-tree.h (c_type_canonical): Prototype for new function.
* c-typeck.cc (c_type_canonical): New function.
(ptr_to_tagged_member): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123356-2.c: New test.
* gcc.dg/struct-alias-2.c: New test.
Martin Uecker [Tue, 6 Jan 2026 18:26:42 +0000 (19:26 +0100)]
c: Fix wrong code related to TBAA for components of structure types 1/2 [PR122572]
When computing TYPE_CANONICAL we form equivalence classes of types
ignoring some aspects. In particular, we treat two structure / union
types as equivalent if a member is a pointer to another tagged type
which has the same tag, even if this pointed-to type is otherwise not
compatible. The fundamental reason why we do this is that even in a
single TU the equivalence class needs to be consistent with compatibility
of incomplete types across TUs. (LTO globs such pointers to void*).
The bug is that the test incorrectly treated also two pointed-to types
without tag as equivalent. One would expect that this just pessimizes
aliasing decisions, but due to how the middle-end handles TBAA for
components of structures, this leads to wrong code.
Jakub Jelinek [Tue, 3 Mar 2026 14:47:08 +0000 (15:47 +0100)]
i386: Use orb instead of orl/orq for stack probes/clash [PR124336]
This PR is about an inconsistency between AT&T and Intel syntax
for output_adjust_stack_and_probe/output_probe_stack_range.
On ia32 they use both orl or or BYTE PTR, i.e. 32-bit or,
but on x86_64 in AT&T syntax they use orq (i.e. 64-bit or) and
in Intel syntax they use or DWORD PTR (i.e. 32-bit or).
These cases are used when probing stack in a loop, for each
page one probe. There is also the probe_stack named pattern
which currently uses word_mode or (i.e. 64-bit or for x86_64)
for both syntaxes, used when probing only once.
Functionally, I think whether we do an 8-bit or 32-bit or 64-bit
or with 0 constant doesn't matter, we don't modify any values on the
stack, just pretend to modify it. The 8-bit and 32-bit ors
are 1-byte shorter though than 64-bit one. How the 3 behave
performance-wise is unknown, if the particular probed spot on the
stack hasn't been stored/read for a while and won't be for a while,
then I'd think it shouldn't matter, dunno if there can be store
forwarding effects if it has been e.g. written or read very recently
by some other function as say 32-bit access and now is 8-bit. The
access after the probe (if it happens soon enough) should be in valid
programs a store (and again, dunno if there can be issues if the
sizes are different).
Now, for consistency reasons, we could just make the Intel
syntax match the AT&T and use 64-bit or on x86_64, so
use QWORD PTR instead of DWORD PTR if stack_pointer_rtx is 64-bit
in those 2 functions and be done with it.
Another possibility is use always 32-bit ors (in both those 2 functions
and probe_stack*; similar to the posted patch except testsuite changes
aren't needed and s/{b}/{l}/g;s/QI/SI/g;s/BYTE PTR/DWORD PTR/g) and
last option is to always use 8-bit ors (which is what the following
patch does). Or some other mix, say use 32-bit ors for -Os/-Oz and
64-bit ors otherwise.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/124336
* config/i386/i386.cc (output_adjust_stack_and_probe): Use
or{b} rather than or%z0 and BYTE PTR rather than DWORD PTR.
(output_probe_stack_range): Likewise.
* config/i386/i386.md (probe_stack): Pass just 2 arguments
to gen_probe_stack_1, first adjust_address to QImode, second
const0_rtx.
(@probe_stack_1_<mode>): Remove.
(probe_stack_1): New define_insn.
* gcc.target/i386/stack-check-11.c: Allow orb next to orl/orq.
* gcc.target/i386/stack-check-18.c: Likewise.
* gcc.target/i386/stack-check-19.c: Likewise.
Jakub Jelinek [Tue, 3 Mar 2026 14:44:19 +0000 (15:44 +0100)]
c++: Set OLD_PARM_DECL_P even in regenerate_decl_from_template [PR124306]
The following testcase ICEs, because we try to instantiate the PARM_DECLs
of foo <int> twice, once when parsing ^^foo <int> and remember in a
REFLECT_EXPR a PARM_DECL in there, later on regenerate_decl_from_template
is called and creates new set of PARM_DECLs and changes DECL_ARGUMENTS
(or something later on in that chain) to the new set.
This means when we call parameters_of on ^^foo <int> later on, they won't
compare equal to the earlier acquired ones, and when we do e.g. type_of
or other operation on the old PARM_DECL where it needs to search the
DECL_ARGUMENTS (DECL_CONTEXT (parm_decl)) list, it will ICE because it
won't find it there.
The following patch fixes it similarly to how duplicate_decls deals
with those, by setting OLD_PARM_DECL_P flag on the old PARM_DECLs, so that
before using reflections of those we search DECL_ARGUMENTS and find the
corresponding new PARM_DECL.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR c++/124306
* pt.cc (regenerate_decl_from_template): Mark the old PARM_DECLs
replaced with tsubst_decl result with OLD_PARM_DECL_P flag.
Marek Polacek [Mon, 2 Mar 2026 15:42:29 +0000 (10:42 -0500)]
c++/reflection: static member template operator [PR124324]
This testcase didn't compile properly because eval_is_function and
eval_extract got an unresolved TEMPLATE_ID_EXPR. We used to resolve
them in process_metafunction but I removed that call, thinking it was
no longer necessary. This patch puts it in eval_substitute which
should cover it.
Richard Biener [Mon, 2 Mar 2026 14:08:03 +0000 (15:08 +0100)]
Do not mark stmts PURE_SLP for loop vectorization
Remove this legacy marking from loop vectorization code and adjust
few leftovers from the removal of hybrid SLP support.
* tree-vect-slp.cc (vect_make_slp_decision): Do not call
vect_mark_slp_stmts.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
We are always doing SLP.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_2): No need to reset
STMT_SLP_TYPE.
Jonathan Yong [Thu, 26 Feb 2026 11:24:13 +0000 (11:24 +0000)]
gcc: libgdiagnostics DLL for mingw should be for mingw hosts
Fixed incorrect attempts to build a libgdiagnostics by naming it
as a DLL when gcc is configured as a cross compiler that targets
mingw but hosted on non-Windows systems.
gcc/ChangeLog:
* Makefile.in: the libgdiagnostics shared object for mingw
should be based on host name, not target name.
rtl-ssa: Ensure live-out uses before redefinitions [PR123786]
This patch fixes cases in which:
(1) a register is live in to an EBB;
(2) the register is live out of at least one BB in the EBB; and
(3) the register is redefined by a later BB in the same EBB.
We were supposed to create live-out uses for (2), so that the redefinition
in (3) cannot be moved up into the live range of (1).
The patch does this by collecting all definitions in second and
subsequence BBs of an EBB. It then creates degenerate phis for those
registers that do not naturally need phis. For speed and simplicity,
the patch does not check for (2). If a register is live in to the EBB,
then it must be used somewhere, either in the EBB itself or in a
successor outside of the EBB. A degenerate phi would eventually
be needed in either case.
This requires moving append_bb earlier, so that add_phi_nodes can
iterate over the BBs in an EBB.
live_out_value contained an on-the-fly optimisation to remove redundant
phis. That was a mistake. live_out_value can be called multiple times
for the same quantity. Replacing a phi on-the-fly messes up bookkeeping
for second and subsequent calls.
The live_out_value optimisation was mostly geared towards memory.
As an experiment, I added an assert for when the optimisation applied
to registers. It only fired once in an x86_64-linux-gnu bootstrap &
regression test, in gcc.dg/tree-prof/split-1.c. That's a very poor
(but unsurprising) return. And the optimisation will still be done
eventually anyway, during the phi simplification phase. Doing it on
the fly was just supposed to allow the phi's memory to be reused.
The patch therefore moves the optimisation into add_phi_nodes and
restricts it to memory (for which it does make a difference).
gcc/
PR rtl-optimization/123786
* rtl-ssa/functions.h (function_info::live_out_value): Delete.
(function_info::create_degenerate_phi): New overload.
* rtl-ssa/blocks.cc (all_uses_are_live_out_uses): Delete.
(function_info::live_out_value): Likewise.
(function_info::replace_phi): Keep live-out uses if they are followed
by a definition in the same EBB.
(function_info::create_degenerate_phi): New overload, extracted
from create_reg_use.
(function_info::add_phi_nodes): Ensure that there is a phi for
every live input that is redefined by a second or subsequent
block in the EBB. Record that such phis need live-out uses.
(function_info::record_block_live_out): Use look_through_degenerate_phi
rather than live_out_value when setting phi inputs. Remove use of
live_out_value for live-out uses. Inline the old handling of
bb_mem_live_out.
(function_info::start_block): Move append_bb call to...
(function_info::create_ebbs): ...here.
* rtl-ssa/insns.cc (function_info::create_reg_use): Use the new
create_degenerate_phi overload.
gcc/testsuite/
PR rtl-optimization/123786
* gcc.target/aarch64/pr123786.c: New test.
Jakub Jelinek [Tue, 3 Mar 2026 08:51:33 +0000 (09:51 +0100)]
i386: Fix up some FMA patterns for -masm=intel [PR124315]
The following 4 define_insns don't have matching operands between AT&T and
Intel syntax, %3 is "0" and %1 was missing.
Searched grep '%0%{%4%}|%0%{%4%}' *.md and didn't find other spots where
the operand numbers wouldn't match (reverse order of course).
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/124315
* config/i386/sse.md (avx512f_vmfmadd_<mode>_mask3<round_name>,
avx512f_vmfmsub_<mode>_mask3<round_name>,
avx512f_vmfnmadd_<mode>_mask3<round_name>,
avx512f_vmfnmsub_<mode>_mask3<round_name>): Use %<iptr>1 instead of
%<iptr>3 in -masm=intel syntax.
Jakub Jelinek [Tue, 3 Mar 2026 08:49:33 +0000 (09:49 +0100)]
i386: Rename avx512fp16_mov<mode> to *avx512fp16_mov<mode>
On Mon, Mar 02, 2026 at 08:04:53PM +0800, Hongtao Liu wrote:
> You are correct. There is no place that calls
> gen_avx512fp16_mov{v8hf,v8bf,v8hi}. The original pattern‘s name is
> avx512fp16_vmovsh which is added in r12-3407-g9e2a82e1f9d2c4, there's
> also another pattern named *avx512fp16_movsh . At that time, the * was
> added to distinguish between these two patterns.
> And yes, we can add* to the pattern name.
Richard Biener [Tue, 3 Mar 2026 08:18:36 +0000 (09:18 +0100)]
Remove XFAIL for detecting dot-product pattern in vect-reduc-dot-s8b.c
With the change to vect_reassociating_reduction_p this pattern will
always match (application is still conditional on uarch availability),
so remove the XFAIL.
PR testsuite/122961
* gcc.dg/vect/vect-reduc-dot-s8b.c: Remove XFAIL on
dot-prod pattern detection.
Patrick Palka [Tue, 3 Mar 2026 03:37:15 +0000 (22:37 -0500)]
c++: improve constraint recursion diagnostic
Our constraint recursion diagnostics are not ideal because they
usually show the atom with an uninstantiated parameter mapping, e.g
concepts-recursive-sat5.C:6:41: error: satisfaction of atomic constraint 'requires(A a, T t) {a | t;} [with T = T]' depends on itself
This is a consequence of our two-level caching of atomic constraints,
where we first cache the uninstantiated atom+args and then the
instantiated atom+no args, and most likely the first level of caching
detects the recursion, at which point we have no way to get a hold of
the instantiated atom.
This patch fixes this by linking the the first level of caching to the
second level, so that we can conveniently print the instantiated atom in
case of constraint recursion detected from the first level of caching.
Alternatively we could make only the second level of caching diagnose
constraint recursion but then we'd no longer catch constraint recursion
that occurs during parameter mapping instantiation. This current approach
seems simpler, and it also seems natural to have the two cache entries
somehow linked anyway.
gcc/cp/ChangeLog:
* constraint.cc (struct sat_entry): New data member inst_entry.
(satisfaction_cache::satisfaction_cache): Initialize inst_entry.
(satisfaction_cache::get): Use it to prefer printing the
instantiated atom in case of constraint recursion.
(satisfy_atom): Set inst_entry of the first cache entry to point
to the second entry.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-recursive-sat2.C: Verify that the
instantiated parameter mapping is printed.
* g++.dg/cpp2a/concepts-recursive-sat5.C: Likewise.
In the first testcase below, the targ generic lambda
template<class T, class V = decltype([](auto) { })>
...
has two levels of parameters, the outer level {T} and its own level.
We iteratively substitute into this targ lambda three times:
1. The first substitution is during coerce_template_parms with args={T*, }
and tf_partial set. Since tf_partial is set, we defer the substitution.
2. The next substitution is during regeneration of f<void>()::<lambda>
with args={void}. Here we merge with the deferred arguments to
obtain args={void*, } and substitute them into the lambda, returning
a regenerated generic lambda with template depth 1 (no more outer
template parameters).
3. The final (non-templated) substitution is during instantiation of
f<int>()::<lambda>'s call operator with args={int}. But at this
point, the targ generic lambda has only one set of template
parameters, its own, and so this substitution causes us to substitute
away all its template parameters (and its deduced return type).
We end up ICEing from tsubst_template_decl due to its operator()
having now having an empty template parameter set.
The problem ultimately is that the targ lambda leaks into a template
context that has more template parameters than its lexical context, and
we end up over-substituting into the lambda. By the third substitution
the lambda is effectively non-dependent and we really just want to lower
it to a non-templated lambda without actually doing any substitution.
Unfortunately, I wasn't able to get such lowering to work adequately
(e.g. precise dependence checks don't work, uses_template_parms (TREE_TYPE (t))
wrongly returns false, false, true respectively during each of the three
substitutions.)
This patch instead takes a different approach, and makes lambda
deferred-ness sticky: once we decide to defer substitution into a
lambda, we keep deferring any subsequent substitution until the
final substitution, which must be non-templated. So for this
particular testcase the substitutions are now:
1. Return a lambda with deferred args={T*, }.
2. Merge args={void} with deferred args={T*, }, obtaining args={void*, }
and returning a lambda with deferred args={void*, }.
3. Merge args={int} with deferred args={void*, }, obtaining args={void*, }.
Since this substitution is final (processing_template_decl is cleared),
we substitute args={void*, } into the lambda once and for all and
return a regenerated non-templated generic lambda with template depth 1.
In order for a subsequent add_extra_args to properly merge arguments
that have been iteratively deferred, it and build_extra_args needs
to propagate TREE_STATIC appropriately (which effectively signals
whether the arguments are a full set or not).
While PR123655 is a regression, this patch also fixes the similar
PR123408 which is not a regression. Thus, I suspect that the testcase
from the first PR only worked by accident.
PR c++/123665
PR c++/123408
gcc/cp/ChangeLog:
* pt.cc (build_extra_args): If TREE_STATIC was set on the
arguments, keep it set.
(add_extra_args): Set TREE_STATIC on the resulting arguments
when substituting templated arguments into a full set of
deferred arguments.
(tsubst_lambda_expr): Always defer templated substitution if
LAMBDA_EXPR_EXTRA_ARGS was set.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-targ22.C: New test.
* g++.dg/cpp2a/lambda-targ22a.C: New test.
* g++.dg/cpp2a/lambda-targ23.C: New test.
Robert Dubner [Mon, 2 Mar 2026 20:36:40 +0000 (15:36 -0500)]
cobol: Improved efficiency of code generated for MOVE "A" TO VAR(1:1). [119456]
This PR rightly noted that COBOL source code which obviously could
result in simple machine language did not. These changes take advantage
of the compiler knowing, at compile time, the values of literal offsets
and lengths, and uses that knowledge to generate much more efficient
GENERIC for those cases.
gcc/cobol/ChangeLog:
PR cobol/119456
* genapi.cc (mh_source_is_literalA): Don't set refmod_e attribute
unless it is necessary.
(have_common_parent): Helper routine that determines whether two
COBOL variables are members of the same data description.
(mh_alpha_to_alpha): Modified for greater efficiency when table
subscripts and reference modification parameters are numeric
literals.
* genutil.cc (get_data_offset): Recognizes when table subscripts
and refmod offsets are numeric literals.
(refer_size): Recognizes when refmod offsets are numeric literals.
(refer_size_source): Recognizes when table subscripts are numeric
literals.
doc: Switch some attribute examples to using standard syntax [PR102397]
To finish up PR102397, I've switched some of the attribute examples to
use the new standard syntax (in addition to the few examples that were
already there). Because the old syntax is so common in existing code,
I don't think we want to switch all of the examples -- although when
folks add new attributes going forward, I'd recommend using the
standard syntax in the documentation.
I tested that all the modified examples are accepted by GCC. There
are relatively few examples of target-specific attributes for the
targets I have existing builds for or can build easily to use for such
testing, so I decided to just to leave all the target-specific
examples alone and focus on the common attributes.
gcc/ChangeLog
PR c++/102397
* doc/extend.texi (Attributes): Explicitly say that all attributes
work in both syntaxes and examples may show either form.
(Common Attributes): Convert some examples to use the new syntax.
The unordered containers have 2 types of iterators, the usual ones and the
local_iterator to iterate through a given bucket. In _GLIBCXX_DEBUG mode there
are then 4 lists of iterators, 2 for iterator/const_iterator and 2 for
local_iterator/const_local_iterator.
This patch is making sure that the unordered container's mutex is only lock/unlock
1 time when those lists of iterators needed to be iterate for invalidation purpose.
Also remove calls to _M_check_rehashed after erase operations. Standard do not permit
to rehash on erase operation so we will never implement it.
libstdc++-v3/ChangeLog
* include/debug/safe_unordered_container.h
(_Safe_unordered_container::_M_invalidate_locals): Remove.
(_Safe_unordered_container::_M_invalidate_all): Lock mutex while calling
_M_invalidate_if and _M_invalidate_locals.
(_Safe_unordered_container::_M_invalidate_all_if): New.
(_Safe_unordered_container::_M_invalidate): New.
(_Safe_unordered_container::_M_invalidate_if): Make private, add __scoped_lock
argument.
(_Safe_unordered_container::_M_invalidate_local_if): Likewise.
* include/debug/safe_unordered_container.tcc
(_Safe_unordered_container::_M_invalidate_if): Adapt and remove lock.
(_Safe_unordered_container::_M_invalidate_local_if): Likewise.
* include/debug/unordered_map
(unordered_map::erase(const_iterator, const_iterator)): Lock before loop on
iterators. Remove _M_check_rehashed call.
(unordered_map::_M_self): New.
(unordered_map::_M_invalidate): Remove.
(unordered_map::_M_erase): Adapt and remove _M_check_rehashed call.
(unordered_multimap::_M_erase(_Base_iterator, _Base_iterator)): New.
(unordered_multimap::erase(_Kt&&)): Use latter.
(unordered_multimap::erase(const key_type&)): Likewise.
(unordered_multimap::erase(const_iterator, const_iterator)):
Lock before loop on iterators. Remove _M_check_rehashed.
(unordered_multimap::_M_self): New.
(unordered_multimap::_M_invalidate): Remove.
(unordered_multimap::_M_erase): Adapt. Remove _M_check_rehashed call.
* include/debug/unordered_set
(unordered_set::erase(const_iterator, const_iterator)): Add lock before loop
for iterator invalidation. Remove _M_check_rehashed call.
(unordered_set::_M_self): New.
(unordered_set::_M_invalidate): Remove.
(unordered_set::_M_erase): Adapt and remove _M_check_rehashed call.
(unordered_multiset::_M_erase(_Base_iterator, _Base_iterator)): New.
(unordered_multiset::erase(_Kt&&)): Use latter.
(unordered_multiset::erase(const key_type&)): Likewise.
(unordered_multiset::erase(const_iterator, const_iterator)):
Lock before loop on iterators. Remove _M_check_rehashed.
(unordered_multiset::_M_self): New.
(unordered_multiset::_M_invalidate): Remove.
(unordered_multiset::_M_erase): Adapt. Remove _M_check_rehashed call.
Filip Kastl [Mon, 2 Mar 2026 13:33:06 +0000 (14:33 +0100)]
sparc: Don't require a sparc assembler with TLS [PR123926]
Since r16-6798, it wasn't possible to build a sparc GCC without having
a sparc assembler installed. That shoudn't be the case since there are
usecases for just compiling into assembly.
The problem was sparc.h doing '#define TARGET_TLS HAVE_AS_TLS'.
Building GCC failed when HAVE_AS_TLS wasn't defined which is the case
when one doesn't have an assembler with TLS installed during
./configure.
This patch addresses the problem.
Pushing as obvious.
PR target/123926
gcc/ChangeLog:
* config/sparc/sparc.h (HAVE_AS_TLS): Default to 0.
Jakub Jelinek [Mon, 2 Mar 2026 14:44:40 +0000 (15:44 +0100)]
testsuite: Fix up vec-cvt-1.c for excess precision target [PR124288]
The intent of the code is to find the largest (or smallest) representable
float (or double) smaller (or greater than) or equal to the given integral
maximum (or minimum).
The code uses volatile vars to avoid excess precision, but was relying on
(volatile_var1 = something1 - something2) == volatile_var2
to actually store the subtraction into volatile var and read it from there,
making it an optimization barrier. That is not the case, we compare directly
the rhs of the assignment expression with volatile_var2, so on excess precision
targets it can result in unwanted optimizations.
Fixed by using a comma expression to make sure comparison doesn't know the
value to compare.
2026-03-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124288
* gcc.dg/torture/vec-cvt-1.c (FLTTEST): Use comma expression
to store into {flt,dbl}m{in,ax} and read from it again for
comparison.
Alfie Richards [Mon, 23 Feb 2026 13:13:30 +0000 (13:13 +0000)]
aarch64: Fix FMV reachability and cgraph_node defintion value [PR 124167]
Fix the reachability checks for FMV nodes which were put in the wrong
place and fix the definition value for a dispatched symbol to match
that of the default node.
PR target/124167
gcc/ChangeLog
* attribs.cc (make_dispatcher_decl): Change node->definition
to inherit from the node its called on.
* ipa.cc (remote_unreachable_nodes): Move FMV logic out of
(!in_boundary_p) if block.
This patch adds line_info debug information support to .BTF.ext
sections.
Line info information is used by the BPF verifier to improve error
reporting and give more precise source code referenced errors.
gcc/ChangeLog:
PR target/113453
* config/bpf/bpf-protos.h (bpf_output_call): Change prototype.
* config/bpf/bpf.cc (bpf_output_call): Change to adapt operands
and return
the instruction template instead of immediately emit asm and
not allow proper final expected execution flow.
(bpf_output_line_info): Add function to introduce line info
entries in respective structures
(bpf_asm_out_unwind_emit): Add function as hook to
TARGET_ASM_UNWIND_EMIT. This hook is called before any
instruction is emitted.
* config/bpf/bpf.md: Change calls to bpf_output_call.
* config/bpf/btfext-out.cc (struct btf_ext_lineinfo): Add fields
to struct.
(bpf_create_lineinfo, btf_add_line_info_for): Add support
function to insert line_info data in respective structures.
(output_btfext_line_info): Function to emit line_info data in
.BTF.ext section.
(btf_ext_output): Call output_btfext_line_info.
* config/bpf/btfext-out.h: Add prototype for
btf_add_line_info_for.