Peter Bergner [Wed, 14 Jan 2026 21:12:21 +0000 (15:12 -0600)]
RISC-V: Enable the ZD constraint only when xmipscbop is enabled [PR123092]
The ZD constraint is specific to the mips prefetch instruction. It is
currently always enabled, leading to ICEs when xmipscbop is disabled.
Solved by only enabling the ZD constraint whenever xmipscbop is enabled.
2026-01-14 Peter Bergner <bergner@tenstorrent.com>
gcc/
PR target/123092
* config/riscv/constraints.md (ZD): Disable when xmipscbop is disabled.
Signed-off-by: Peter Bergner <bergner@tenstorrent.com>
Qing Zhao [Wed, 14 Jan 2026 17:14:01 +0000 (17:14 +0000)]
C: fix issues when supporting counted_by fields in anonymous structure/unions [PR122495,PR122496]
Currently, GCC does not fully support the cases when a FAM or pointer field and
its corresponding counted_by field are in different anonymous structure/unions
of a common named structure.
For example:
struct nested_mixed {
struct {
union {
int b;
float f;
};
int n;
};
struct {
PTR_TYPE *pointer __attribute__((__counted_by__(n)));
FAM_TYPE c[] __attribute__((__counted_by__(b)));
};
} *nested_mixed_annotated;
In order to support such cases, we always need to locate the first outer
named structure as the root, and then lookup_field inside this named
structure. When building the component_ref for the counted_by field,
we need to build a chain of component_ref starting from the root structure.
When supporting the above in general, we also need to handle the following
several special cases correctly:
A. Support an untagged type as its own top-level type. */
struct { int a; char b[] __attribute__ ((counted_by (a))); } *x;
B. Support an unnamed field with a named struct/union. */
struct s { struct { int a; char b[] __attribute__ ((counted_by (a))); } *x; } *y;
C. When -fms-extensions is enabled:
C.1 Do not support the inward-to-outward counted-by field reference
since checking the validity of such reference depends on unknown
situation at the end of the structure definition.
struct bar {
char *buf __counted_by (n); /* { dg-error "attribute is not a field declaration in the same structure as" } */
};
C.2 support the outward-to-inward counted-by field reference.
PR C/122495
PR C/122496
gcc/c/ChangeLog:
* c-decl.cc (grokfield): Call verify_counted_by_for_top_anonymous_type
for named field.
(verify_counted_by_attribute): Change the prototype to a recursive
routine.
(verify_counted_by_for_top_anonymous_type): New routine.
(finish_struct): Set C_TYPE_FIELDS_HAS_COUNTED_BY and call the routine
verify_counted_by_attribute only for named structure.
* c-parser.cc (c_parser_declaration_or_fndef): Call
verify_counted_by_for_top_anonymous_type for the decl.
(c_parser_parameter_declaration): Call
verify_counted_by_for_top_anonymous_type for the parameter.
(c_parser_type_name): Call verify_counted_by_for_top_anonymous_type
for the type.
* c-tree.h (C_TYPE_FIELDS_HAS_COUNTED_BY): New flag.
(verify_counted_by_for_top_anonymous_type): New routine.
* c-typeck.cc (build_counted_by_ref): Locate the root named structure,
build a chain of component_ref starting from the root structure.
gcc/testsuite/ChangeLog:
* gcc.dg/counted-by-anonymous-2-char.c: New test.
* gcc.dg/counted-by-anonymous-2-float.c: New test.
* gcc.dg/counted-by-anonymous-2-struct.c: New test.
* gcc.dg/counted-by-anonymous-2-union.c: New test.
* gcc.dg/counted-by-anonymous-2.c: New test.
* gcc.dg/counted-by-anonymous-3.c: New test.
* gcc.dg/counted-by-anonymous.c: New test.
* gcc.dg/ubsan/counted-by-anonymous-bounds-1.c: New test.
* gcc.dg/ubsan/counted-by-anonymous-bounds-2.c: New test.
* gcc.dg/ubsan/counted-by-anonymous-bounds.c: New test.
Martin Jambor [Wed, 14 Jan 2026 19:41:57 +0000 (20:41 +0100)]
ipa-cp: Always return the right type in ipa_value_from_jfunc (PR123542)
PR 123542 is about triggering a checking assert that verifies that we
indeed clone a function for the constant value we started evaluating.
The issue is that we get a double 2 instead of a float 2 which comes
down to function ipa_value_from_jfunc not doing the necessary
conversion when dealing directly with constants (and ancestor jump
functions but that is very unlikley to cause problems).
This patch makes sure the required conversion is performed in all
cases (even for the ancestor JFs) and checks that the result type is
known, because when the function is invoked from ipa-modref.cc or
ipa-fnsummary.cc that may not be the case.
gcc/ChangeLog:
2026-01-14 Martin Jambor <mjambor@suse.cz>
PR ipa/123542
* ipa-cp.cc (ipa_value_from_jfunc): Always use
ipacp_value_safe_for_type. Bail out if parm_type is NULL.
Joseph Myers [Wed, 14 Jan 2026 17:10:33 +0000 (17:10 +0000)]
testsuite: Enable cross testing for simulate-thread tests
The simulate-thread tests exit early in cross and remote cases. Apply
fixes similar to (but affecting separate code) those recently posted
for the guality tests: do not use [transform gdb] since that's a cross
GDB and the tests expect to run GDB on the target, test existence on
the target not the build system, and copy required files to the target
(deleting them later).
Tested for x86_64-pc-linux-gnu to make sure native testing isn't
broken, and with cross to aarch64-linux.
* lib/gcc-dg.exp (gdb-exists): Do not use [transform gdb]. Run
selected GDB with -v on target rather than testing for existence
on build system.
* lib/gcc-simulate-thread.exp (simulate-thread): Do not return
early for non-native and remote. Download executable and GDB
command file to target before running GDB there, and delete when
closing target.
Joseph Myers [Wed, 14 Jan 2026 17:09:40 +0000 (17:09 +0000)]
testsuite: Fix issues with cross testing in guality tests
The guality tests expect to run (native) GDB on the target. If this
is available, there is some support for cross testing, but with
various defects and limitations, some of them fixed here.
* GUALITY_GDB_NAME doesn't get passed through to the target for remote
testing (a general limitation of the DejaGnu interface: it doesn't
support setting environment variables on the target). Not fixed
here.
* Using in-tree GDB is only appropriate when host = target, since
in-tree GDB runs on the host and the testsuite runs GDB on the
target. Fixed here. (Note that [isnative] isn't used because that
refers to build = target, and we need host = target here.)
* [transform gdb] is not appropriate because that's a cross-GDB and
the tests run GDB on the target, so need a native GDB. Fixed here.
* gdb-test (used by some guality tests) exits early in cross and
remote cases (whereas the tests running GDB directly from the test
itself via popen can already do so on the target without needing
further patches). Fixed here. There are various other fixes done
in gdb-test as well; it needs to transfer files to the target then
delete them afterwards.
* report_gdb expects to run GDB on the host when the tests run it on
the target. Fixed here.
Note: some similar fixes will also be needed for simulate-thread tests
to get them working for cross testing, but I haven't done those yet.
Tested for x86_64-pc-linux-gnu to make sure native testing isn't
broken, and with cross to aarch64-linux.
* lib/gcc-gdb-test.exp (gdb-test): Do not return early for
non-native and remote. Download executable and GDB command file
to target before running GDB there, and delete when closing
target.
(report_gdb): Use target when testing GDB availability and
version.
* g++.dg/guality/guality.exp: Only use in-tree GDB when host =
target. Do not use [transform gdb].
* gcc.dg/guality/guality.exp: Likewise.
* gfortran.dg/guality/guality.exp: Likewise.
David Malcolm [Wed, 14 Jan 2026 16:49:12 +0000 (11:49 -0500)]
c++: UX improvements for close matches in print_candidates
This patch improves the UX for various cases of "no declaration matches"
where print_candidates encounters a close match.
For example, consider the const vs non-const here:
class foo
{
public:
void test (int i, int j, void *ptr, int k);
};
// Wrong "const"-ness of param 3.
void foo::test (int i, int j, const void *ptr, int k)
{
}
where we emit (with indentation provided by the prior patch):
test.cc:8:6: error: no declaration matches ‘void foo::test(int, int, const void*, int)’
8 | void foo::test (int i, int j, const void *ptr, int k)
| ^~~
• there is 1 candidate
• candidate is: ‘void foo::test(int, int, void*, int)’
test.cc:4:8:
4 | void test (int i, int j, void *ptr, int k);
| ^~~~
test.cc:1:7: note: ‘class foo’ defined here
1 | class foo
| ^~~
which requires the user to look through the pairs of parameters
and try to find the mismatch by eye.
This patch adds notes identifying that parameter 3 has the mismatch, and
what the mismatch is, using a pair of colors to highlight and contrast
the type mismatch.
test.cc:8:6: error: no declaration matches ‘void foo::test(int, int, const void*, int)’
8 | void foo::test (int i, int j, const void *ptr, int k)
| ^~~
• there is 1 candidate
• candidate is: ‘void foo::test(int, int, void*, int)’
test.cc:4:8:
4 | void test (int i, int j, void *ptr, int k);
| ^~~~
• parameter 3 of candidate has type ‘void*’...
test.cc:4:34:
4 | void test (int i, int j, void *ptr, int k);
| ~~~~~~^~~
• ...which does not match type ‘const void*’
test.cc:8:43:
8 | void foo::test (int i, int j, const void *ptr, int k)
| ~~~~~~~~~~~~^~~
test.cc:1:7: note: ‘class foo’ defined here
1 | class foo
| ^~~
This also works for the "this" case, improving the UX for messing up
const vs non-const between decls and defns of member functions; see
bad-fndef-2.C for an example.
For screenshots showing the colorization effect, see slides 9-12 of my
Cauldron talk:
https://gcc.gnu.org/wiki/cauldron2025#What.27s_new_with_diagnostics_in_GCC_16
("Hierarchical diagnostics (not yet in trunk)").
gcc/cp/ChangeLog:
* call.cc (get_fndecl_argument_location): Use DECL_SOURCE_LOCATION
for "this".
* cp-tree.h (class candidate_context): New.
(print_candidates): Add optional candidate_context param.
* decl2.cc: Include "gcc-rich-location.h" and
"tree-pretty-print-markup.h".
(struct fndecl_signature): New.
(class parm_rich_location): New.
(class fndecl_comparison): New.
(class decl_mismatch_context): New.
(check_classfn): For the "no declaration matches" case, pass an
instance of a custom candidate_context subclass to
print_candidates, using fndecl_comparison to report on close
matches.
* pt.cc (print_candidates): Add optional candidate_context param.
Use it if provided to potentially emit per-candidate notes.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/bad-fndef-1.C: Add directives to expect
"void *" vs "const void *" notes about parameter 3 of the close
candidate.
* g++.dg/diagnostic/bad-fndef-2.C: New test.
* g++.dg/diagnostic/bad-fndef-3.C: New test.
* g++.dg/diagnostic/bad-fndef-4.C: New test.
* g++.dg/diagnostic/bad-fndef-5.C: New test.
* g++.dg/diagnostic/bad-fndef-6.C: New test.
* g++.dg/diagnostic/bad-fndef-7.C: New test.
* g++.dg/diagnostic/bad-fndef-7b.C: New test.
* g++.dg/diagnostic/bad-fndef-8.C: New test.
* g++.dg/diagnostic/bad-fndef-9.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 14 Jan 2026 16:43:57 +0000 (11:43 -0500)]
c++: use nesting and counts in print_candidates
In r15-6116-gd3dd24acd74605 I updated print_z_candidates to print a
count of the number of candidates, and to show the number of each
candidate in the list if there is more than one.
The following patch updates print_candidates to work in a similar
way, showing counts, numbering, and using nesting.
Consider this test case for which we print 2 candidates:
class foo
{
public:
void test (int i, int j, void *ptr, int k);
void test (int i, int j, int k);
};
// Wrong "const"-ness of a param, for one of the overloads (param 3).
void foo::test (int i, int j, const void *ptr, int k)
{
}
The output before the patch is:
test.cc:9:6: error: no declaration matches ‘void foo::test(int, int, const void*, int)’
9 | void foo::test (int i, int j, const void *ptr, int k)
| ^~~
test.cc:5:8: note: candidates are: ‘void foo::test(int, int, int)’
5 | void test (int i, int j, int k);
| ^~~~
test.cc:4:8: note: ‘void foo::test(int, int, void*, int)’
4 | void test (int i, int j, void *ptr, int k);
| ^~~~
test.cc:1:7: note: ‘class foo’ defined here
1 | class foo
| ^~~
With the patch, the output looks like:
test.cc:9:6: error: no declaration matches ‘void foo::test(int, int, const void*, int)’
9 | void foo::test (int i, int j, const void *ptr, int k)
| ^~~
• there are 2 candidates
• candidate 1: ‘void foo::test(int, int, int)’
test.cc:5:8:
5 | void test (int i, int j, int k);
| ^~~~
• candidate 2: ‘void foo::test(int, int, void*, int)’
test.cc:4:8:
4 | void test (int i, int j, void *ptr, int k);
| ^~~~
test.cc:1:7: note: ‘class foo’ defined here
1 | class foo
| ^~~
which I believe is much more readable.
I dabbled with removing the "there is 1 candidate" line for the case of
a single candidate, but I think I prefer it to be present.
FWIW I've been experimenting with followups that
* show more nested information about the problems (e.g. the "void *"
vs "const void *" mismatch) - having the candidates be nested is a
useful step towards that
* potentially look at the "edit distance" of the type signatures to find
close matches, and perhaps reordering/highlighting them (e.g. in the
above candidate 2 is arguably a closer match than candidate 1, due to
the "const" snafu) - gathering an auto_vec might help with that.
gcc/cp/ChangeLog:
* call.cc (print_z_candidates): Move inform_n call into a new
inform_num_candidates function.
* class.cc (check_methods): Pass location to call to
print_candidates.
(resolve_address_of_overloaded_function): Likewise.
* cp-tree.h (print_candidates): Add location_t param.
(inform_num_candidates): New decl.
* decl.cc (make_typename_type): Pass location to call to
print_candidates.
(reshape_init_class): Likewise.
(lookup_and_check_tag): Likewise.
* decl2.cc (check_classfn): Likewise.
* error.cc (qualified_name_lookup_error): Likewise.
* init.cc (build_new_1): Likewise.
* name-lookup.cc (lookup_using_decl): Likewise.
(set_decl_namespace): Likewise.
(push_namespace): Likewise.
* parser.cc (cp_parser_nested_name_specifier_opt): Likewise.
(cp_parser_lookup_name): Likewise.
* pt.cc (print_candidates_1): Drop, converting the looping part
into...
(flatten_candidates): ...this new function.
(inform_num_candidates): New function.
(print_candidates): Use flatten_candidates to build an auto_vec
of candidates, and use this to print them here, rather than in
print_candidates_1. Eliminate the dynamic allocation of spaces
for a prefix in favor of printing "candidate %i" when there is
more than one candidate. Add "error_loc" param and pass it to
inform_num_candidates to show a heading, and add nesting levels
for it and for the candidate notes.
(determine_specialization): Pass location to calls to
print_candidates.
* search.cc (lookup_member): Likewise.
* semantics.cc (finish_id_expression_1): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/inline-ns2.C: Make dg-message directives non-empty.
* g++.dg/cpp23/explicit-obj-lambda11.C: Prune the extra note.
* g++.dg/diagnostic/bad-fndef-1.C: New test.
* g++.dg/lookup/decl1.C: Give the dg-message directives different
messages.
* g++.dg/lookup/using17.C: Update expected output.
* g++.dg/parse/non-dependent2.C: Likewise.
* g++.old-deja/g++.other/lineno2.C: Give the dg-message directives
different messages.
* g++.old-deja/g++.pt/t37.C: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jakub Jelinek [Wed, 14 Jan 2026 16:09:13 +0000 (17:09 +0100)]
c++: Don't ICE on computed goto in potential_constant_expression_1 [PR123551]
r16-968-g5c6364b09a6 has added if (DECL_ARTIFICIAL (*target)) return true;
stmt for GOTO_EXPRs. This unfortunately ICEs if *target is not a decl,
which is the case for computed gotos. For those we should always reject
them, so the following patch additionally checks for LABEL_DECL
before testing DECL_ARTIFICIAL on it.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR c++/123551
* constexpr.cc (potential_constant_expression_1) <case GOTO_EXPR>:
Only test DECL_ARTIFICIAL on LABEL_DECLs.
Wilco Dijkstra [Tue, 13 Jan 2026 16:21:05 +0000 (16:21 +0000)]
AArch64: Use anchors for FP constants [PR 121240]
Use anchors for FP constants - instead of using mergeable sections which blocks
anchors, load FP constants from the constdata section. To avoid the anchor
loads being deoptimized later, ensure the cost of a CONST_DOUBLE is larger than
the cost of a MEM that loads it from constdata. Codesize is slightly smaller,
performance on SPECFP2017 is ~0.30% better.
gcc:
PR target/121240
* config/aarch64/aarch64.md (mov<mode>): Expand FP immediates early.
* config/aarch64/aarch64.cc (aarch64_select_rtx_section): Force
immediates <= 8 bytes to constdata.
(aarch64_rtx_costs): Increase cost of CONST_DOUBLE loaded from memory.
gcc:
PR target/114528
* config/aarch64/aarch64.cc (aarch64_check_mov_add_imm12):
New function to check and emit MOV+ADD/SUB immediates.
(aarch64_internal_mov_immediate): Add support for MOV+ADD/SUB
immediates.
gcc/testsuite:
PR target/114528
* gcc.target/aarch64/pr114528.c: New test.
Tejas Belagod [Tue, 13 Jan 2026 16:58:38 +0000 (16:58 +0000)]
expand: Handle variable-length vector constructors with debug [PR123392]
Variable-length Vector initializer constructors currently only work in
non-debug mode. It ICEs when compiled with -g. This patch fixes it to handle
variable-length vector intialization by limiting the constructor elements to
the lower bound of the variable length poly which is also the maximum number
of elements allowed in the initializer.
Thomas Schwinge [Fri, 30 May 2025 09:37:46 +0000 (11:37 +0200)]
Add 'libgomp.c++/target-std__[...]-concurrent-usm.C' test cases for C++ 'std::unordered_map', 'std::unordered_multimap', 'std::unordered_multiset', 'std::unordered_set'
Thomas Schwinge [Fri, 30 May 2025 09:37:46 +0000 (11:37 +0200)]
Fix up 'libgomp.c++/target-std__[...]-concurrent-usm.C' dynamic memory allocation
OpenMP/USM implies memory accessible from host as well as device, but doesn't
imply that allocation vs. deallocation may be done in the opposite context.
For most of the test cases, (by construction) we're not allocating memory
during device execution, so have nothing to clean up. (..., but still document
these semantics.) But for a few, we have to clean up:
'libgomp.c++/target-std__map-concurrent-usm.C',
'libgomp.c++/target-std__multimap-concurrent-usm.C',
'libgomp.c++/target-std__multiset-concurrent-usm.C',
'libgomp.c++/target-std__set-concurrent-usm.C'.
For 'libgomp.c++/target-std__multimap-concurrent-usm.C' (only), this issue
already got addressed in commit 90f2ab4b6e1463d8cb89c70585e19987a58f3de1
"libgomp.c++/target-std__multimap-concurrent.C: Fix USM memory freeing".
However, instead of invoking the 'clear' function (which doesn't generally
guarantee to release dynamically allocated memory; for example, see PR123582
"C++ unordered associative container: dynamic memory management"), we properly
restore the respective object into pristine state.
Thomas Schwinge [Fri, 9 May 2025 13:09:51 +0000 (15:09 +0200)]
libgomp: Add a few more OpenMP/USM test cases
... where there are clear differences in behavior for OpenMP/USM run-time
configurations.
We shall further clarify all the intended semantics, once the implementation
begins to differentiate OpenMP 'requires unified_shared_memory' vs.
'requires self_maps'.
Jakub Jelinek [Wed, 14 Jan 2026 14:56:29 +0000 (15:56 +0100)]
defaults: Use argument in default EH_RETURN_DATA_REGNO definition [PR123115]
All targets use the EH_RETURN_DATA_REGNO macro argument except for
NVPTX which uses the default.
The problem is that we get then -Wunused-but-set-variable warning
when building df-scan.cc for NVPTX target with GCC 16 (post r16-2258
PR44677) on:
unsigned int i;
/* Mark the registers that will contain data for the handler. */
for (i = 0; ; ++i)
{
unsigned regno = EH_RETURN_DATA_REGNO (i);
if (regno == INVALID_REGNUM)
break;
If it were multiple targets suffering from this, I'd think about
adding something to use i in loops like this, but as it is
just the default definition, the following patch fixes it by
using the argument.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/123115
* defaults.h (EH_RETURN_DATA_REGNO): Add void (N) to the macro
definition inside of a comma expression before INVALID_REGNUM.
Jakub Jelinek [Wed, 14 Jan 2026 14:53:44 +0000 (15:53 +0100)]
combine: Partially revert the r12-4475 changes [PR120250]
The r12-4475 change added extra code to recog_for_combine to attempt to
force some constants into the constant pool.
Unfortunately, as this (UB at runtime) testcase shows, such changes are
harmful for computed_jump_p jumps. The computed_jump_p returns false
for loads from constant pool MEMs:
case MEM:
return ! (GET_CODE (XEXP (x, 0)) == SYMBOL_REF
&& CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
and so if we try to optimize a computed jump that way, it becomes
a non-computed jump which doesn't match any other jump category
(simplejump_p, tablejump_p, condjump_p, returnjump_p, eh_returnjump_p,
asm goto) and doesn't have any label recorded in JUMP_LABEL (because,
it doesn't really jump to any LABEL), so some passes like dwarf2cfi
can get confused about it and ICE.
The following patch just prevents that, by only doing the r12-4475
changes if it is not a jump.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR target/120250
* combine.cc (recog_for_combine): Don't try to put SET_SRC
into a constant pool if SET_DEST is pc_rtx.
Richard Biener [Wed, 14 Jan 2026 11:45:19 +0000 (12:45 +0100)]
tree-optimization/123190 - fix costing of permuted contiguous loads
The following fixes a regression from the time we split load groups
along SLP boundaries. When we face a permuted load from an access
that is contiguous across loop iterations we emit code that loads
the whole group and then emit required permutations. The permutations
might not need all those loads, and if we split the group we would
not have emitted them. Fortunately when analyzing a permutation
we compute both the number of required permutes and the number of
loads that will survive the followin DCE. So make sure to use that
when costing. This allows the previously added testcase for PR123190
to undergo epilog vectorization also at -O2 plus when using non-generic
tuning, such as tuning for Zen4 which ups the cost for XMM loads.
PR tree-optimization/123190
* tree-vectorizer.h (vect_load_store_data): Add n_loads member.
* tree-vect-stmts.cc (get_load_store_type): Record the
number of required loads for permuted loads.
(vectorizable_load): Make use of this when costing loads
for VMAT_CONTIGUOUS[_REVERSE].
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: Do not
require -mtune=generic.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-2.c: Add
variant with -O2 instead of -O3, inner loop not unrolled.
The following adjusts the condition where we reject vectorization
because the scalar loop runs only for a single iteration (or two,
in case we need to peel for gaps). Because this is over-eager
when considering the case of VF == 1 where instead the cost model
should decide wheter it is worthwhile or not. I'm playing
conservative here and exclude the case of two iterations as I
do not have benchmark evidence.
This helps fixing a regression observed with improved SLP handling,
not exactly for the options used in the PR though, but for a more
common -O3 -march=x86-64-v3 this speeds up 433.milc by 6%.
PR tree-optimization/123190
* tree-vect-loop.cc (vect_analyze_loop_costing): Allow
vectorizing loops with a single scalar iteration iff the
vectorization factor is 1.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: New testcase.
* gcc.dg/vect/slp-28.c: Avoid epilogue vectorization for
simplicity.
Jakub Jelinek [Wed, 14 Jan 2026 12:21:57 +0000 (13:21 +0100)]
simplify-rtx: Fix up SUBREG and LSHIFTRT order canonicalization for AND with constant [PR123544]
On Tue, Nov 04, 2025 at 12:59:03PM +0530, Kishan Parmar wrote:
> PR rtl-optimization/93738
> * simplify-rtx.cc (simplify_binary_operation_1): Canonicalize
> SUBREG(LSHIFTRT) into LSHIFTRT(SUBREG) when valid.
This change regressed the following testcase on aarch64-linux.
From what I can see, the PR93738 change has been written with non-paradoxical
SUBREGs in mind but on this testcase on aarch64 we have a paradoxical SUBREG,
in particular simplify_binary_operation_1 is called with AND, SImode,
(subreg:SI (lshiftrt:HI (subreg:HI (reg/v:SI 108 [ x ]) 0)
(const_int 8 [0x8])) 0)
and op1 (const_int 32767 [0x7fff]) and simplifies that since the PR93738
optimization was added into
(and:SI (lshiftrt:SI (reg/v:SI 108 [ x ])
(const_int 8 [0x8]))
(const_int 32767 [0x7fff]))
This looks wrong to me.
Consider (reg/v:SI 108 [ x ]) 0) could have value 0x12345678U.
The original expression takes lowpart 16-bits from that, i.e. 0x5678U,
shifts that right logically by 8 bits, so 0x56U, makes a paradoxical SUBREG
from that, i.e. 0x????0056U and masks that with 0x7fff, i.e. result is 0x56U.
The new expression shifts 0x12345678U logically right by 8 bits, i.e. 0x123456U and
masks it by 0x7fff, result 0x3456U.
Thus, I think we need to limit to non-paradoxical SUBREGs.
On the rlwimi-2.c testcase I see on powerpc64le-linux no differences in
emitted assembly without/with the patch.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/123544
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
<case AND>: Don't canonicalize (subreg (lshiftrt (x cnt)) low) into
(lshiftrt (subreg x low) cnt) if the SUBREG is paradoxical.
Enable time profile function reordering with AutoFDO.
The patch enables time profile based reordering with AutoFDO with
-fauto-profile -fprofile-reorder-functions, by mapping timestamps obtained from perf
into node->tp_first_run.
The rationale for doing this is:
(1) GCC already implements time-profile function reordering with PGO, the patch enables
it with AutoFDO.
(2) While time profile ordering is primarly meant for optimizing startup time,
we've also observed good effects on code-locality for large internal workloads.
(3) Possibly useful for function reordering when accurate profile annotation is
hard with AutoFDO -- For eg, if branch samples are missing (due to absence of
LBR like structure).
On AutoFDO tools side, a corresponding patch extends gcov to emit 64-bit perf timestamp that
records first execution of function, which loosely corresponds to PGO's time_profile counter.
The timestamp is stored adjacent to head field in toplevel function info.
On GCC side, this patch makes the following changes:
(1) Changes to auto-profile pass:
The patch adds a new field timestamp to function_instance,
and populates it in read_function_instance.
It maintains a new timestamp_info_map from timestamp -> <name, tp_first_run>,
which maps timestamps sorted in ascending order to (1..N), so lowest ordered
timestamp is mapped to 1 and so on. The rationale for this is that
timestamps are 64-bit integers, and we don't need the full 64-bit range
for ordering by tp_first_run.
During annotation, the timestamp associated with function_instance is looked up
in timestamp_info_map, and corresponding mapped value is assigned
to node->tp_first_run.
Dhruv's sourcefile tracking patch already handles LTO privatized symbols.
The patch adds a workaround for mismatched/empty filenames, which should go away
when the issues with AutoFDO tools dwarf parsing are resolved.
(2) Param to disable profile driven opts.
The patch adds param auto-profile-reorder-only which only enables time-profile reordering with
AutoFDO:
(a) Useful as a debugging aid to isolate regression to either function reordering or profile driven opts.
(b) As a stopgap measure to avoid regressions with AutoFDO profile driven opts.
(c) Possibly useful for architectures which do not support branch sampling.
gcc/ChangeLog:
* auto-profile.cc: (string_table::filenames): New method.
(function_instance::timestamp_): New member.
(function_instance::timestamp): New accessor for timestamp_ member.
(function_instance::set_timestamp): New function.
(function_instance::prop_timestamp): Likewise.
(function_instance::prop_timestamp_1): Likewise.
(function_instance::function_instance): Initialize timestamp_ to 0.
(function_instance::read_function_instance): Adjust prototype by
replacing head_count with toplevel param with default value true, and
stream in head_count and timestamp values from gcov file.
(autofdo::timestamp_info_map): New std::map.
(autofdo_source_profile::get_function_instance_by_decl): New argument
filename with default value NULL.
(autofdo_source_profile::read): Populate timestamp_info_map and
propagate timestamp to inlined instances from toplevel function.
(afdo_annotate_cfg): Assign node->tp_first_run based on
timestamp_info_map and bail out of annotation if
param_auto_profile_reorder_only is enabled.
* params.opt: New param auto-profile-reorder-only.
During ML discussions of a match.pd pattern that was introducing a new
instance of 'warn_strict_overflow', Richard mentioned that this use
should be discouraged [1]. After pointing out that this usage was
documented in tree.h he then explained that we should remove the note
from the header [2]. Here's the reasoning:
"Ah, we should remove that note. -Wstrict-overflow proved useless IMO,
it's way too noisy as it diagnoses when the compiler relies on overflow
not happening, not diagnosing when it possibly happens. That's not a
very useful diagnostic to have - it does not point to a possible problem
in the code (we could as well diagnose _all_ signed arithmetic
operations for the same argument that we might eventually rely on
overflow not happening)."
Aside from removing the tree.h node we're also removing the 2 references
in match.pd. match.pd patterns tend to be copied around to serve as a
base for new patterns (like I did in [3] adding a
'fold_overflow_warning'), and if we want to discourage the use avoiding
its spread is a good start.
Note that there are a lot of references left, most of them in
gcc/fold-const.cc. Some references are using in nested helpers inside
the file, entangled with code that does other things. Removing all
references from the project is out of scope for this quick patch.
* match.pd: remove 'fold_overflow_warning' references.
* tree.h (TYPE_OVERFLOW_UNDEFINED): remove note telling
that we must use warn_strict_overflow for every optimization
based on TYPE_OVERFLOW_UNDEFINED.
gcc/testsuite/ChangeLog:
* gcc.dg/Wstrict-overflow-1.c: Removed because we no longer
issue a 'fold_overflow_warning' with the
`(le (minus (@0 INTEGER_CST@1)) INTEGER_CST@2)` pattern.
Signed-off-by: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
Andrew Pinski [Tue, 13 Jan 2026 23:21:56 +0000 (15:21 -0800)]
match: Remove redundant type checks from `(T1)(a bit_op (T2)b)` pattern.
As mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2026-January/705657.html,
there were some redundant checks in this pattern. In the first if,
the check for pointer and OFFSET_TYPE is redundant as there is a check for
INTEGRAL_TYPE_P before hand. For the second one, the check for INTEGRAL_TYPE_P
on the inner most type is not needed as there is a types_match right afterwards
Pushed as obvious after bootstra/test on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`(T1)(a bit_op (T2)b)`): Remove redundant
type checks.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Nathaniel Shead [Fri, 9 Jan 2026 10:36:32 +0000 (21:36 +1100)]
c++: modules and coroutines
While working on another issue I found that currently modules do not
work with coroutines at all. This patch fixes a number of issues in
both the coroutines logic and modules logic to ensure that they play
well together. To summarize:
- The coroutine proxy objects did not have a DECL_CONTEXT set (required
for modules to merge declarations).
- The coroutine transformation functions are always non-inline, even
for an inline ramp function, which means that modules need an override
to ensure the definitions are available where needed.
- Coroutine transformation functions were not marked DECL_COROUTINE_P,
despite accessors implying that they were.
- In an importing TU we had lost the connection between the ramp
functions and the transform functions, as they were kept in a pair
of global maps.
- Modules streaming couldn't discriminate between the actor or destroy
functions when merging.
- Modules streaming wasn't setting the cfun->coroutine_component flag,
needed to activate the middle-end coroutine lowering pass.
This patch also separates the coroutine_info_table initialization from
the ensure_coro_initialized function. If the first time we see a
coroutine is from a module import, we need to register the
transformation functions now but calling ensure_coro_initialized would
lookup e.g. std::coroutine_traits, which may only be visible from this
module that we're currently reading, causing a recursive load.
Separating the concerns allows this to work correctly.
gcc/cp/ChangeLog:
* coroutines.cc (create_coroutine_info_table): New function.
(get_or_insert_coroutine_info): Mark static.
(ensure_coro_initialized): Likewise; use
create_coroutine_info_table.
(coro_promise_type_found_p): Set DECL_CONTEXT for proxies.
(coro_set_ramp_function): New function.
(coro_set_transform_functions): New function.
(coro_build_actor_or_destroy_function): Use
coro_set_ramp_function, mark as DECL_COROUTINE_P.
* cp-tree.h (coro_set_transform_functions): Declare.
(coro_set_ramp_function): Declare.
* module.cc (struct merge_key): New field coro_disc.
(dumper::impl::nested_name): Distinguish coroutine transform
functions.
(get_coroutine_discriminator): New function.
(trees_out::key_mergeable): Stream coroutine discriminator.
(check_mergeable_decl): Adjust comment, check for matching
coroutine discriminator.
(trees_in::key_mergeable): Read coroutine discriminator.
(has_definition): Override for coroutine transform functions.
(trees_out::write_function_def): Stream linked ramp, actor, and
destroy functions for coroutines.
(trees_in::read_function_def): Read them.
(module_state::read_cluster): Set cfun->coroutine_component.
gcc/testsuite/ChangeLog:
* g++.dg/modules/coro-1_a.C: New test.
* g++.dg/modules/coro-1_b.C: New test.
Nathaniel Shead [Sat, 10 Jan 2026 02:52:37 +0000 (13:52 +1100)]
c++/modules: Update lang_decl_bool streaming
The set of lang_decl flags that we were streaming had gotten out of sync
with the current list; update them.
One notable change is that anticipated_p, which had previously been
deliberately skipped, is now only used for DECL_OMP_PRIVATIZED_MEMBER,
and so should probably be streamed as well.
gcc/cp/ChangeLog:
* module.cc (trees_out::lang_decl_bools): Update list of flags.
(trees_in::lang_decl_bools): Likewise.
Reviewed-by: Jason Merrill <jason@redhat.com> Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Andrew Pinski [Tue, 13 Jan 2026 02:58:47 +0000 (18:58 -0800)]
match: Add simplification of `(a*zero_one_valued_p) & b` if `a & b` simplifies [PR119402]
This is a small reassociation for `a*bool & b` into `(a & b) * bool` checking if
`a & b` simplifies. Since it could be the case `b` is `~a` or `a` or something
else that might simplify when anding with `a`.
Note this fixes a regression for aarch64 where the cost of a multiply vs `&-` changed
in GCC 14 and can no longer optimize some cases at the RTL level.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119402
gcc/ChangeLog:
* match.pd (`(a*zero_one_valued_p) & b`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bitops-14.c: New test.
* gcc.dg/tree-ssa/bitops-15.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The problem here is after some heurstics changes the check
loop is now unrolled so we eliminate the array. This means
the check for not having -2147483648 no longer works as
we don't handle SLP in this case.
So the best option is to force the check loop not to unroll
(no vectorize) as this is just testing we SLP the normal
signbit places rather than dealing with the checking loop.
Pushed as obvious after testing the testcase on aarch64-linux-gnu.
PR testsuite/122522
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/signbitv2sf.c (main): Disable
unrolling and vectorizer for the checking loop.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Martin Uecker [Tue, 13 Jan 2026 18:09:53 +0000 (19:09 +0100)]
c: fix checking ICE related to transparent unions and atomic [PR123309]
When matching function arguments in composite_type_internal and one
type comes from a transparent union, it is possible to end up with
atomic and non-atomic types because this case is not handled correctly.
The type matching logic is rewritten in a cleaner way to use helper
functions and to not walk the argument lists three times. With this
change, a checking assertion can be added to test for matching qualifiers
for pointers. (In general, this assumption is still violated for
function return types.)
PR c/123309
gcc/c/ChangeLog:
* c-typeck.cc (transparent_union_replacement): New function.
(composite_type_internal): Rewrite logic.
(type_lists_compatible_p): Remove dead code for NULL arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123309.c: New test.
* gcc.dg/union-composite-type.c: New test.
Tomasz Kamiński [Tue, 13 Jan 2026 15:29:42 +0000 (16:29 +0100)]
libstdc++: Fix handling iterators with proxy subscript in heap algorithms.
This patch replaces uses of subscripts in heap algorithms, that where introduced
in r16-4100-gaaeca77a79a9a8 with dereference of advanced iterators.
The Cpp17RandomAccessIterator requirements, allows operator[] to return any
type that is convertible to reference, however user-provided comparators are
required only to accept result of dereferencing the iterator (i.e. reference
directly). This is visible, when comparator defines operator() for which
template arguments can be deduduced from reference (which will fail on proxy)
or that accepts types convertible from reference (see included tests).
For test we introduce a new proxy_random_access_iterator_wrapper iterator
in testsuite_iterators.h, that returns a proxy type from subscript operator.
This is separate type (instead of additional template argument and aliases),
as it used for test that work with C++98.
libstdc++-v3/ChangeLog:
* include/bits/stl_heap.h (std::__is_heap_until, std::__push_heap)
(std::__adjust_heap): Replace subscript with dereference of
advanced iterator.
* testsuite/util/testsuite_iterators.h (__gnu_test::subscript_proxy)
(__gnu_test::proxy_random_access_iterator_wrapper): Define.
* testsuite/25_algorithms/sort_heap/check_proxy_brackets.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Andrew Pinski [Sat, 10 Jan 2026 07:14:22 +0000 (23:14 -0800)]
ifcvt: Improve `cmp?a&b:a` to try with -1 [PR123312]
After the current improvements to ifcvt, on some targets for
cmp?a&b:a it is better to produce `(cmp?b:-1) & a` rather than
`(!cmp?a:0)|(a & b)`. So this extends noce_try_cond_zero_arith (with
a rename to noce_try_cond_arith) to see if `cmp ? a : -1` is cheaper than
`!cmp?a:0`.
Bootstrapped and tested on x86_64-linux-gnu.
PR rtl-optimization/123312
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Rename to ...
(noce_try_cond_arith): This. For AND try `cmp ? a : -1`
also to see which one cost less.
(noce_process_if_block): Handle the rename.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jeff Law [Tue, 13 Jan 2026 14:16:05 +0000 (07:16 -0700)]
[PR tree-optimization/123530] Fix ICE in recently added match.pd pattern
The gimple optimization passes can create negative shift counts and pass them
into the simplification routines as seen by the code in pr123530. If we then
call tree_to_uhwi on those values we get a nice little ICE.
This guards the tree_to_uhwi calls on tree_fits_uhwi_p and resolves the ICE. I
just protected them all in this recently added pattern.
Bootstrapped and regression tested on x86 and riscv. Also tested on the rest
of the embedded targets without any regressions.
Pushing to the trunk.
PR tree-optimization/123530
gcc/
* match.pd (reassociating xor to enable rotations): Verify constants
fit into a uhwi before trying to extract them as a uhwi.
gcc/testsuite/
* gcc.dg/torture/pr123530.c: New test.
Richard Biener [Tue, 13 Jan 2026 11:59:41 +0000 (12:59 +0100)]
middle-end/123573 - fix VEC_PERM folding more
The following fixes the fix from r16-6709-ga4716ece529dfd some
more by making sure permute to one operand folding faces same
element number vectors but also insert a VIEW_CONVERT_EXPR for
the case one is VLA and one is VLS (when the VLA case is actually
constant, like with -msve-vector-bits=128). It also makes the
assert that output and input element numbers match done in
fold_vec_perm which this pattern eventually dispatches to into
a check (as the comment already indicates).
Testcases are in the target specific aarch64 testsuite already.
PR middle-end/123573
* fold-const.cc (fold_vec_perm): Actually check, not assert,
that input and output vector element numbers agree.
* match.pd (vec_perm @0 @1 @2): Make sure element numbers
are the same when folding to an input vector and wrap that
inside a VIEW_CONVERT_EXPR.
Robin Dapp [Fri, 9 Jan 2026 12:25:40 +0000 (13:25 +0100)]
rtlanal: Determine nonzero bits of popcount from operand [PR123501].
The PR involves large mask vectors (e.g. V128BI) from which we take
the popcount. Currently a (popcount:DI (V128BI)) is assumed to have
at most 8 set bits as we assume the popcount operand also has DImode.
This patch uses the operand mode for unary operations and thus
calculates a proper nonzero-bits mask.
We could do the same estimate for ctz and clz but they use nonzero in a
non-poly way and I didn't want to change more than necessary. Therefore
the patch just returns -1 when we have a different operand mode for
ctz/clz.
Thomas Schwinge [Fri, 9 May 2025 13:05:57 +0000 (15:05 +0200)]
amdgcn: Adjust failure mode for gfx908 USM: 'libgomp.fortran/map-alloc-comp-9-usm.f90'
The change/rationale that commit 1cf9fda4936de54198858b8f54cd9707a3725f4e
"amdgcn: Adjust failure mode for gfx908 USM" applied to a number of test cases
likewise applies to 'libgomp.fortran/map-alloc-comp-9-usm.f90'.
libgomp/
* testsuite/libgomp.fortran/map-alloc-comp-9-usm.f90: Require
working Unified Shared Memory to run the test.
Thomas Schwinge [Mon, 15 Dec 2025 15:12:33 +0000 (16:12 +0100)]
openmp: Bump Version from 4.5 to 5.2 (2/4): Some more '-Wno-deprecated-openmp'
These changes should've been included in
commit 382edf047effcd5b1ce66389742bd1b3e178ac95
"openmp: Bump Version from 4.5 to 5.2 (2/4)", to avoid some more instances of:
warning: use of 'omp declare target' as a synonym for 'omp begin declare target' has been deprecated since OpenMP 5.2 [-Wdeprecated-openmp]
warning: 'to' clause with 'declare target' deprecated since OpenMP 5.2, use 'enter' [-Wdeprecated-openmp]
Warning: Non-C_PTR type argument at (1) is deprecated, use HAS_DEVICE_ADDR [-Wdeprecated-openmp]
Warning: 'to' clause with 'declare target' at (1) deprecated since OpenMP 5.2, use 'enter' [-Wdeprecated-openmp]
Thomas Schwinge [Mon, 15 Dec 2025 15:12:33 +0000 (16:12 +0100)]
openmp: Bump Version from 4.5 to 5.2 (2/4): 'libgomp.oacc-c-c++-common/vred2d-128.c' [PR123098]
'libgomp.oacc-c-c++-common/vred2d-128.c' had gotten '-Wno-deprecated-openmp'
applied as part of commit 382edf047effcd5b1ce66389742bd1b3e178ac95
"openmp: Bump Version from 4.5 to 5.2 (2/4)", which conceptually doesn't make
sense, as 'libgomp.oacc-c-c++-common/vred2d-128.c' isn't an OpenMP test case.
In commit 9c119b0fdd9ba5a6821c0b4c5874ade8f4969109
"openmp: Limit - reduction -Wdeprecated-openmp diagnostics to OpenMP, testsuite fixes [PR123098]",
the erroneous diagnostic got disabled, so we don't need
'-Wno-deprecated-openmp' anymore.
Jakub Jelinek [Tue, 13 Jan 2026 09:06:47 +0000 (10:06 +0100)]
Use -latomic_asneeded or -lgcc_s_asneeded to workaround libtool issues [PR123396]
On Mon, Jan 12, 2026 at 12:13:35PM +0100, Florian Weimer wrote:
> One way to work around the libtool problem would be to stick the
> as-needed into an existing .so linker script, or create a new one under
> a different name (say libatomic_optional.so) that has AS_NEEDED in it,
> and link with -latomic_optional. Then libtool would not have to be
> taught about --push-state/--pop-state etc.
That seems to work.
So far bootstrapped (c,c++,fortran,lto only) and make install tested
on x86_64-linux, tested on a small program without need to libatomic and
struct S { char a[25]; };
_Atomic struct S s;
int main () { struct S t = s; s = t; }
which does at -O0.
Before this patch I got
for i in `find x86_64-pc-linux-gnu/ -name lib\*.so.\*.\*`; do ldd -u $i 2>&1 | grep -q libatomic.so.1 && echo $i; done
x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0
x86_64-pc-linux-gnu/libsanitizer/asan/.libs/libasan.so.8.0.0
x86_64-pc-linux-gnu/libsanitizer/hwasan/.libs/libhwasan.so.0.0.0
x86_64-pc-linux-gnu/libsanitizer/lsan/.libs/liblsan.so.0.0.0
x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.2.0.0
x86_64-pc-linux-gnu/32/libsanitizer/ubsan/.libs/libubsan.so.1.0.0
x86_64-pc-linux-gnu/32/libsanitizer/asan/.libs/libasan.so.8.0.0
x86_64-pc-linux-gnu/32/libstdc++-v3/src/.libs/libstdc++.so.6.0.35
x86_64-pc-linux-gnu/libgcobol/.libs/libgcobol.so.2.0.0
x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6.0.35
With this patch it prints nothing.
2026-01-13 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/123396
gcc/
* configure.ac (gcc_cv_ld_use_as_needed_ldscript): New test.
(USE_LD_AS_NEEDED_LDSCRIPT): New AC_DEFINE.
* gcc.cc (LINK_LIBATOMIC_SPEC): Use "-latomic_asneeded" instead
of LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION
if USE_LD_AS_NEEDED_LDSCRIPT is defined.
(init_gcc_specs): Use "-lgcc_s_asneeded" instead of
LD_AS_NEEDED_OPTION " -lgcc_s " LD_NO_AS_NEEDED_OPTION
if USE_LD_AS_NEEDED_LDSCRIPT is defined.
* config.in: Regenerate.
* configure: Regenerate.
libatomic/
* acinclude.m4 (LIBAT_BUILD_ASNEEDED_SOLINK): New AM_CONDITIONAL.
* libatomic_asneeded.so: New file.
* libatomic_asneeded.a: New file.
* Makefile.am (toolexeclib_DATA): Set if LIBAT_BUILD_ASNEEDED_SOLINK.
(all-local): Install those files into gcc subdir.
* Makefile.in: Regenerate.
* configure: Regenerate.
libgcc/
* config/t-slibgcc (SHLIB_ASNEEDED_SOLINK,
SHLIB_MAKE_ASNEEDED_SOLINK, SHLIB_INSTALL_ASNEEDED_SOLINK): New
vars.
(SHLIB_LINK): Include $(SHLIB_MAKE_ASNEEDED_SOLINK).
(SHLIB_INSTALL): Include $(SHLIB_INSTALL_ASNEEDED_SOLINK).
Paul Thomas [Tue, 13 Jan 2026 08:19:05 +0000 (08:19 +0000)]
Fortran: Check constant PDT type specification parameters [PR112460]
2026-01-14 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/112460
* array.cc (resolve_array_list): Stash the first PDT element
and check its type specification parameters against those of
subsequent elements.
* expr.cc (get_parm_list_from_expr): New function to extract the
type spec lists from expressions to be compared.
(gfc_check_type_spec_parms): New function to compare type spec
lists between two expressions. Emit an error if any constant
values are different.
(gfc_check_assign): Check that the PDT type specification parms
are the same on lhs and rhs.
* gfortran.h : Add prototype for gfc_check_type_spec_parms.
* trans-expr.cc (copyable_array_p): PDT arrays are not copyable
gcc/testsuite
PR fortran/112460
* gfortran.dg/pdt_81.f03: New test.
Richard Biener [Mon, 12 Jan 2026 13:10:32 +0000 (14:10 +0100)]
tree-optimization/123539 - signed UB in vector reduction
With previous changes I overlooked one use of vectype.
PR tree-optimization/123539
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use the compute vectype to pun down to smaller or element
size for by-element reductions.
Andrew Pinski [Tue, 13 Jan 2026 05:06:49 +0000 (21:06 -0800)]
xfail store_merging_19.c for the same reason as store_merging_18.c
store_merging_19.c is almost the same as store_merging_18.c except
it has assume align in it to allow it work on strict align targets.
Somehow when I was looking at the testresults I noticed 18 but not 19
when I was looking into failures.
Pushed as obvious.
gcc/testsuite/ChangeLog:
* gcc.dg/store_merging_19.c: xfail.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Kito Cheng [Mon, 29 Dec 2025 06:14:25 +0000 (14:14 +0800)]
VN: Fix VN ICE for large _BitInt types
gcc.dg/torture/bitint-18.c triggers an ICE in push_partial_def when
compiling for RISC-V with -O2. The issue occurs because
build_nonstandard_integer_type cannot handle bit widths larger than
MAX_FIXED_MODE_SIZE.
For BITINT_TYPE with maxsizei > MAX_FIXED_MODE_SIZE, use build_bitint_type
instead of build_nonstandard_integer_type, similar to what tree-sra.cc does.
gcc/ChangeLog:
* tree-ssa-sccvn.cc (vn_walk_cb_data::push_partial_def): Use
build_bitint_type for BITINT_TYPE when maxsizei exceeds
MAX_FIXED_MODE_SIZE.
Kito Cheng [Mon, 12 Jan 2026 13:31:11 +0000 (21:31 +0800)]
RISC-V: Add support for _BitInt [PR117581]
This patch implements _BitInt support for RISC-V target by defining the
type layout and ABI requirements. The limb mode selection is based on
the bit width, using appropriate integer modes from QImode to TImode.
The implementation also adds the necessary libgcc version symbols for
_BitInt runtime support functions.
Changes in v3:
- Require sync_char_short effective target for bitint-64.c, bitint-82.c
and bitint-84.c tests since they use atomic operations.
- Add -fno-section-anchors to bitint-32-on-rv64.c and adjust expected
assembly output patterns.
Changes in v2:
- limb_mode use up to XLEN when N > XLEN, which is different setting from
the abi_limb_mode.
- Adding missing floatbitinthf in libgcc.
gcc/ChangeLog:
PR target/117581
* config/riscv/riscv.cc (riscv_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Define.
gcc/testsuite/ChangeLog:
PR target/117581
* gcc.dg/torture/bitint-64.c: Add sync_char_short effective target
requirement.
* gcc.dg/torture/bitint-82.c: Likewise.
* gcc.dg/torture/bitint-84.c: Likewise.
* gcc.target/riscv/bitint-32-on-rv64.c: New test.
* gcc.target/riscv/bitint-alignments.c: New test.
* gcc.target/riscv/bitint-args.c: New test.
* gcc.target/riscv/bitint-sizes.c: New test.
libgcc/ChangeLog:
PR target/117581
* config/riscv/libgcc-riscv.ver: New file.
* config/riscv/t-elf (SHLIB_MAPFILES): Add libgcc-riscv.ver.
* config/riscv/t-softfp32 (softfp_extras): Add floatbitinttf and
fixtfbitint.
This also fixes PR 122843 by optimizing out the xor such that we get:
```
_1 = b.a;
_21 = (<unnamed-signed:3>) t_23(D);
// t_23 in the original testcase was 200 so this is reduced to 0
_5 = _1 ^ _21;
# .MEM_24 = VDEF <.MEM_13>
b.a = _5;
```
And then there is no cast catch this pattern:
`(bit_xor (convert1? (bit_xor:c @0 @1)) (convert2? (bit_xor:c @0 @2)))`
As we get:
```
_21 = (<unnamed-signed:3>) t_23(D);
_5 = _1 ^ _21;
_22 = (<unnamed-signed:3>) t_23(D);
_7 = _5 ^ _22;
_25 = (<unnamed-signed:3>) t_23(D);
_8 = _7 ^ _25;
_26 = (<unnamed-signed:3>) t_23(D);
_9 = _7 ^ _26;
```
After unrolling and then fre will optimize away all of those xor.
Patrick Palka [Mon, 12 Jan 2026 16:21:14 +0000 (11:21 -0500)]
c++: deferred noexcept parsing for friend tmpl spec [PR123189]
Since we now defer noexcept parsing for templated friends, a couple of
routines related to deferred parsing need to be updated to cope with friend
template specializations -- their TI_TEMPLATE is a TREE_LIST rather than
a TEMPLATE_DECL, and they don't introduce new template parameters.
PR c++/123189
gcc/cp/ChangeLog:
* name-lookup.cc (binding_to_template_parms_of_scope_p):
Gracefully handle TEMPLATE_INFO whose TI_TEMPLATE is a TREE_LIST.
* pt.cc (maybe_begin_member_template_processing): For a friend
template specialization consider its class context instead.
Jason Merrill [Fri, 9 Jan 2026 06:01:26 +0000 (14:01 +0800)]
c++: more gnu_inline linkage adjustment
Since r16-6477 we allow a gnu_inline to be a key method, because it is only
emitted in one place. It occurs to me that we should make the same
adjustment to other places that check DECL_DECLARED_INLINE_P to decide if a
function has inline/vague/comdat linkage.
PR libstdc++/123326
gcc/cp/ChangeLog:
* cp-tree.h (DECL_NONGNU_INLINE_P): New.
* decl.cc (duplicate_decls, start_decl): Check it.
* decl2.cc (vague_linkage_p, import_export_class): Likewise.
(vtables_uniquely_emitted, import_export_decl): Likewise.
* class.cc (determine_key_method): Check it instead of
lookup_attribute.
libiberty: Make `objalloc_free' `free'-like WRT null pointer
Inspired by a suggestion from Jan Beulich to make one of `objalloc_free'
callers `free'-like with respect to null pointer argument handling make
the function return with no action taken rather than crashing when such
a pointer is passed. This is to make the API consistent with ISO C and
to relieve all the callers from having to check for a null pointer.
libiberty/
* objalloc.c (objalloc_free): Don't use the pointer passed if
null.
Martin Jambor [Mon, 12 Jan 2026 12:32:06 +0000 (13:32 +0100)]
ipa-cp: Fix ipa-bit-cp test for recipient_only lattices
Unfortunately I made a silly copy-and paste error in may patch
introducing the recipient_only flag. This patch fixes it, correctly
bailing out in ipa-bit-cp when it is set during propagation.
gcc/ChangeLog:
2026-01-12 Martin Jambor <mjambor@suse.cz>
PR ipa/123543
* ipa-cp.cc (propagate_bits_across_jump_function): Fix test for
recipient_only_p.
Jakub Jelinek [Mon, 12 Jan 2026 11:40:31 +0000 (12:40 +0100)]
s390: Fix ABI issue in libstdc++.so.6
On Sat, Jan 10, 2026 at 05:24:15PM +0100, Stefan Schulze Frielinghaus wrote:
> libstdc++-v3/ChangeLog:
>
> * config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Add
> names {,P,K}DF16.
This is wrong - an ABI issue.
You can't export new symbols in CXXABI_1.3.14 symbol version when they
weren't exported there in GCC 13.1 already.
Symbols new in GCC 16 like these should be exported in CXXABI_1.3.17.
Fixed thusly.
2026-01-12 Jakub Jelinek <jakub@redhat.com>
* config/abi/pre/gnu.ver (CXXABI_1.3.14): Don't export _ZTI*DF16_ on
s390x.
(CXXABI_1.3.17): Export _ZTI*DF16_ on s390x.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Remove
_ZTI{,P,K}DF16_.
Richard Biener [Mon, 12 Jan 2026 09:04:49 +0000 (10:04 +0100)]
tree-optimization/122830 - move VN through aggregate copies
The following generalizes the few hacks we have to more loosely
allow VN through aggregate copies to a more general (but also
restrictive) feature to rewrite the lookup to a new base with
a constant offset. This should now allow all constant-indexed
aggregate copies and it does never leave any stray components
and hoping for the best.
This resolves the diagnostic regression reported in PR122824.
PR tree-optimization/122830
PR tree-optimization/122824
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Generalize
aggregate copy handling when no variable offsets are
involved.
* gcc.dg/tree-ssa/ssa-fre-112.c: New testcase.
* g++.dg/warn/Warray-bounds-pr122824.C: Likewise.
Richard Biener [Mon, 12 Jan 2026 09:36:44 +0000 (10:36 +0100)]
Fix extra_off mis-computation during aggregate copy VN
With the rewrite of aggregate copy handling in r16-2729-g0d276cd378e7a4
there's an error introduced which accumulates extra_off even if we
throw away some of the tentative component consumption. The following
fixes this.
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Only tentatively
accumulate extra_off when tentatively consuming components
during aggregate copy handling.
Tomasz Kamiński [Mon, 12 Jan 2026 10:07:35 +0000 (11:07 +0100)]
libstdc++: Fix generate_cannonical test for 128bit floating points.
This updates test01, so it properly handle 128bit floating points,
including situation when long double uses such representation.
Firstly, the computation of skips is corrected, by discarding number
values equal to number of calls required to generate element
(skips become zero for all non-float correctly). Furthermore, checks
of histogram for types using iec559 representation, is moved inside
test01 function, so we use correct value for long double, depending
on number of digits in mantissa on given platform.
We also extend test to cover __float128, to test 128bit floating
point on more platforms.
Richard Biener [Fri, 9 Jan 2026 08:35:21 +0000 (09:35 +0100)]
middle-end/123175 - fix parts of const VEC_PERM with relaxed input sizes
The following fixes enough of const VEC_PERM folding and lowering
to deal with the fallout for the two testcases from the PR. We
usually do not generate such problematic VEC_PERM expressions, but
we allow those since GCC 14. As can be seen we mishandle those,
including failure to expand/lower them by zero-extending inputs (which is
what __builtin_shufflevector does).
I'm unsure as to what extent we get such permutes but Tamar indicates
that aarch64 can handle those at least.
PR middle-end/123175
* match.pd (vec_perm @0 @1 @2): Fixup for inputs having a
different number of elements than the result.
* tree-vect-generic.cc (lower_vec_perm): Likewise.
* gcc.dg/torture/pr123175-1.c: New testcase.
* gcc.dg/torture/pr123175-2.c: Likewise.
Rainer Orth [Mon, 12 Jan 2026 09:36:19 +0000 (10:36 +0100)]
libgomp: Skip libgomp.c++/target-cdtor-2.C on Solaris [PR81337]
The libgomp.c++/target-cdtor-2.C test FAILs on Solaris:
FAIL: libgomp.c++/target-cdtor-2.C output pattern test
Compared to the Linux output
~S, 5, 1
[...]
finiDH1, 1
the Solaris output has a different order:
finiDH1, 1
[...]
~S, 5, 1
This is another instance of the long-standing PR c++/81337. As detailed
there, the relative order of ~S::S() and __attribute__((destructor()))
functions isn't guaranteed. Since xfail'ing the dg-output parts isn't
practical, this patch skips the whole test on Solaris.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
Nathaniel Shead [Sat, 10 Jan 2026 23:02:45 +0000 (10:02 +1100)]
c++: Improve diagnostic for implicit conversion errors [PR115163]
This patch adds a note to indicate if any viable explicit conversion
functions were skipped if an implicit conversion failed to occur.
Perhaps the base diagnostic in ocp_convert can be further improved for
class types as well, as the current message is not very clear, but I've
not looked into that for this patch.
Jakub Jelinek [Mon, 12 Jan 2026 09:06:47 +0000 (10:06 +0100)]
simplify-rtx: Fix up shift/rotate VOIDmode count handling [PR123523]
The following testcase ICEs on i686-linux, because the HW in that
case implements the shift as shifting by 64-bit count (anything larger
or equal to number of bits in the first operand's element results
in 0 or sign copies), so the machine description implements it as
such as well.
Now, because shifts/rotates can have different modes on the first
and second operand, when the second one has VOIDmode (i.e. CONST_INT,
I think CONST_WIDE_INT has non-VOIDmode and CONST_DOUBLE with VOIDmode
is hopefully very rarely used), we need to choose some mode for the
wide_int conversion. And so far we've been choosing BITS_PER_WORD/word_mode
or the mode of the first operand's element, whichever is wider.
That works fine on 64-bit targets, CONST_INT has always at most 64 bits,
but for 32-bit targets uses SImode.
Because HOST_BITS_PER_WIDE_INT is always 64, the following patch just
uses that plus DImode instead of BITS_PER_WORD and word_mode.
2026-01-12 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/123523
* simplify-rtx.cc (simplify_const_binary_operation): Use
DImode for VOIDmode shift and truncation counts if int_mode
is narrower than HOST_BITS_PER_WIDE_INT rather than
word_mode if int_mode it is narrower than BITS_PER_WORD.
Jakub Jelinek [Mon, 12 Jan 2026 09:05:50 +0000 (10:05 +0100)]
c++: Remove gnu::gnu_inline attribute on inheriting ctors [PR123526]
The recent addition of gnu::gnu_inline attributes to some C++26 constexpr
methods broke classes which inherit e.g. from std::logic_error or other
C++26 classes with gnu::gnu_inline constructors and use inheriting
constructors. On std::logic_error etc. it has the desired effect that
the ctor itself can be constexpr evaluated and even inlined, but is not
emitted in each TU that needs it and didn't inline it, but is still
contained in libstdc++.{a,so.6}.
Unfortunately inheriting ctors inherit also attributes of the corresponding
ctors except those that clone_attrs filter out and that includes the
gnu_inline attribute if explicitly specified on the base class ctor.
That has the undesirable effect that the implementation detail of e.g.
the std::logic_error class leaks into the behavior of a class that inherits
from it if it is using inheriting constructors, those will result in
undefined symbols for the inheriting constructors if they aren't inlined,
unless one also inherits from it in some TU without gnu_inline there (e.g.
one compiled with -std=c++23 or earlier).
So, the following patch fixes it by removing the gnu::gnu_inline attribute
from the inheriting constructor. Not done in clone_attrs because that
function is also used for the normal constructor cloning and in that case
we do want to clone those attributes.
2026-01-12 Jakub Jelinek <jakub@redhat.com>
PR c++/123526
* method.cc: Include attribs.h.
(implicitly_declare_fn): Remove gnu::gnu_inline attribute.
* g++.dg/ext/gnu-inline-inh-ctor1.C: New test.
* g++.dg/ext/gnu-inline-inh-ctor2.C: New test.
Uros Bizjak [Mon, 12 Jan 2026 08:43:29 +0000 (09:43 +0100)]
testsuite: Remove lp64 requirement from gcc.target/i386/pr123121.c [PR123121]
The test gcc.target/i386/pr123121.c does not rely on LP64-specific
behavior. Drop the dg-require-effective-target lp64 directive so the
test can run on 32-bit i386 targets as well.
The testcase test-frame-related.c fails in 32-bit mode due to
constraints not matching. Use -mpowerpc64 option to ensure that the
testcase works with -m32.
Andrew Pinski [Mon, 12 Jan 2026 03:57:20 +0000 (19:57 -0800)]
testsuite: Disable vector-compare-1.C for arm targets [PR121752]
So arm is a bit special, non_strict_align is sometimes true but
it does not represent the true value of STRICT_ALIGN inside the compiler,
so this testcase fails. This disables the testcase for arm targets where
STRICT_ALIGN is always true even when there is unaligned loads.
Pushed as obvious after testing on x86_64 and arm-eabi (with -march=armv7) to make
sure the testcase no longer run on arm.
PR testsuite/121752
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/vector-compare-1.C: Disable for arm targets.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Steven G. Kargl [Mon, 12 Jan 2026 02:58:19 +0000 (18:58 -0800)]
Fortran: Test cases from previously fixed bug
Adding two testcases from Gerhard Steinmetz from 2016-08-30.
These have had the dejagnu directives added. The last comment
in the PR is from Andrew Pinski notes the PR was fixed in the 9.3,
10+ timeframe. The testcases are small. Committing the tests to
ensure things are not broken in the future.
PR fortran/77415
gcc/testsuite/ChangeLog:
* gfortran.dg/pr77415_1.f90: New test.
* gfortran.dg/pr77415_2.f90: New test.
Pietro Monteiro [Sun, 11 Jan 2026 22:25:12 +0000 (17:25 -0500)]
libga68: Make it possible to debug the GC
If GC_DEBUG is defined then all-upper-case macros will expand to calls
to the debug variant of collector functions.
So add the configury bit to define GC_DEBUG if the user wants and
switch all `GC_` calls to the corresponding macros.
libga68/ChangeLog:
* configure: Regenerate.
* configure.ac: Add --enable-algol68-gc-debug option and
define GC_DEBUG accordingly.
* ga68-alloc.c (_libga68_realloc): Use the C macro version of
the GC function.
(_libga68_realloc_unchecked): Likewise.
(_libga68_malloc): Likewise.
Signed-off-by: Pietro Monteiro <pietro@sociotechnical.xyz>
Michal Jires [Fri, 19 Dec 2025 16:09:16 +0000 (17:09 +0100)]
lto: Fix SegFault in ICF caused by missing body
During LTO symbol merging, weak symbols may be resolved to external
definition.
We reset the symbol, so the body might be released in unreachability
pass. But we didn't mark the symbol with body_removed, so ICF assumed
the body was still there causing SegFault.
gcc/lto/ChangeLog:
* lto-symtab.cc (lto_symtab_merge_symbols): Set body_removed
for symbols resolved outside of IR.
gcc/testsuite/ChangeLog:
* gcc.dg/lto/attr-weakref-2_0.c: New test.
* gcc.dg/lto/attr-weakref-2_1.c: New test.
Michal Jires [Sun, 16 Nov 2025 19:16:15 +0000 (20:16 +0100)]
lto: Add toplevel simple assembly heuristics
This new pass heuristically detects symbols referenced by toplevel
assembly to prevent their optimization.
Heuristics is done by comparing identifiers in assembly to known
symbols.
The pass is split into 2 passes, in LGEN and in WPA.
There must be one pass for WPA to be able to reference any symbol.
However in WPA there may be multiple symbols with the same name,
so we handle those local symbols in LGEN.
gcc/ChangeLog:
* asm-toplevel.cc (mark_fragile_ref_by_asm):
Add marked_local to handle symbol as local.
(ipa_asm_heuristics): New.
(class pass_ipa_asm): New.
(make_pass_ipa_asm_lgen): New.
(make_pass_ipa_asm_wpa): New.
* common.opt: New flto-toplevel-asm-heuristics.
* passes.def: New asm passes.
* timevar.def (TV_IPA_LTO_ASM): New.
* tree-pass.h (make_pass_ipa_asm_lgen): New.
(make_pass_ipa_asm_wpa): New.
gcc/testsuite/ChangeLog:
* gcc.dg/lto/toplevel-simple-asm-1_0.c: New test.
* gcc.dg/lto/toplevel-simple-asm-1_1.c: New test.
* gcc.dg/lto/toplevel-simple-asm-2_0.c: New test.
* gcc.dg/lto/toplevel-simple-asm-2_1.c: New test.
Michal Jires [Wed, 3 Dec 2025 01:16:54 +0000 (02:16 +0100)]
lto: Allow other partitionings for toplevel assembly
For balanced and max partitioning this adds proper partitioning of asm
and related symbols.
The special symbols are partitioned with 1to1 and joined together if
there is no name conflict. All other symbols are partitioned with the
requested partitioning.
In typical usage with small number of toplevel assembly and no name
conflicts, all special symbols will be in the single first partition.
balanced partitioning will continue filling last asm partition.
gcc/lto/ChangeLog:
* lto-partition.cc (join_partitions): Declare.
(lto_1_to_1_map): Split out to..
(map_1_to_1): ..here.
(create_asm_partition): Replaced by..
(create_asm_partitions): ..this.
(lto_max_map): Use new create_asm_partitions.
(lto_balanced_map): Use new create_asm_partitions.
gcc/testsuite/ChangeLog:
* gcc.dg/lto/toplevel-extended-asm-2_0.c: More partitionings.
* gcc.dg/lto/toplevel-extended-asm-2_1.c: Likewise.
Michal Jires [Sun, 16 Nov 2025 14:45:21 +0000 (15:45 +0100)]
lto: Handle .local symbols in toplevel extended assembly
.local symbols cannot become global, so we have to use must_remain_in_tu.
There is no way to mark declaration as both external and static/.local
in C. So we have to disable the implicit definition of static variables.
Also .local asm function still produces "used but never defined" warning.
* gcc.dg/lto/toplevel-extended-asm-2_0.c: New test.
* gcc.dg/lto/toplevel-extended-asm-2_1.c: New test.
* gcc.dg/lto/toplevel-extended-asm-3_0.c: New test.
* gcc.dg/lto/toplevel-extended-asm-3_1.c: New test.
Michal Jires [Thu, 18 Dec 2025 13:58:15 +0000 (14:58 +0100)]
lto: Add must_remain_in_tu flags to symtab_node
With toplevel assembly we are sometimes not allowed to globalize static
symbols. So such symbols cannot be in more than one partition.
must_remain_in_tu_* guarantee that such symbols or references to them do
not escape the original translation unit. Thus 1to1 partitioning is always
valid.