Egas Ribeiro [Thu, 25 Dec 2025 12:33:47 +0000 (12:33 +0000)]
c++: Fix ICE with requires-expression in lambda requires-clause [PR123080]
When parsing a lambda with a trailing requires-clause, calling
cp_parser_requires_clause_opt confused generic lambda parsing when
implicit template parameters were involved.
After failing to parse a type-requirement during tentative parsing and
hitting error recovery code in cp_parser_skip_to_end_of_statement that
aborted the current implicit template, attemping to finish the lambda
declaration caused ICEs.
Fix this by not aborting the current implicit template during tentative
parsing and adding cleanup for fully_implicit_function_template_p.
PR c++/123080
gcc/cp/ChangeLog:
* parser.cc (cp_parser_skip_to_end_of_statement): Don't abort
implicit templates during tentative parsing.
(cp_parser_lambda_declarator_opt): Add cleanup for
fully_implicit_function_template_p before parsing
trailing_requires_clause.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-requires6.C: New test.
* g++.dg/cpp2a/lambda-requires6a.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Fri, 7 Nov 2025 12:24:17 +0000 (23:24 +1100)]
c++: Fold non-ODR usages of potentially constant values early [PR120005]
[basic.link] p14.4 says that a declaration naming a TU-local entity is
an exposure, ignoring "any reference to a non-volatile const object or
reference with internal or no linkage initialized with a constant
expression that is not an odr-use".
To implement this, we cannot stream these entities but must fold them
into their underlying values beforehand. This was already done to a
degree by cp_fold, but it didn't handle all cases, and notably was not
performed on the saved body of a constexpr function, which would then
cause errors during modules streaming.
This patch implements this by supplementing cp_fold with additional
rules to fold non-ODR usages of contants. We need to do this as an
additional walk before saving the constexpr function definition, so we
also disable as much other folding during this walk as possible to
prevent removing any information that the constexpr interpreter
requires to function correctly.
With this we still will error on uses in templates. In general it's
impossible to tell within an uninstantiated template body whether a
reference is an ODR-use in the face of dependent expressions, so we
don't attempt to do anything for this case.
PR c++/119097
PR c++/120005
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1): Fall back to
location from parent expression if needed.
* cp-gimplify.cc (enum fold_flags): Add ff_only_non_odr.
(cp_fold_data::cp_fold_data): Assert invariant for flags.
(cp_fold_omp_clause_refs_r): New function.
(cp_fold_r): Specially handle OMP_CLAUSE_DECL.
(cp_fold_function_non_odr_use): New function.
(cp_fold_non_odr_use_1): New function.
(cp_fold_maybe_rvalue): Fold non-ODR uses when requested.
(cp_fold_non_odr_use): New function.
(fold_caches): Increase number of caches.
(get_fold_cache): Use a new cache for non-ODR use walks.
(cp_fold): Skip most folding for non-ODR use walks; always
fold constant-initialized references; remove dead code to
fold __builtin_source_location.
* cp-tree.h (cp_fold_function_non_odr_use): Declare.
(cp_fold_non_odr_use): Declare.
* decl.cc (finish_function): Fold non-ODR uses before saving
constexpr fundef. Invoke PLUGIN_PRE_GENERICIZE before this
folding.
* ptree.cc (cxx_print_xnode): Handle TU_LOCAL_ENTITY.
* tree.cc (bot_manip): Propagate TREE_CONSTANT.
* typeck2.cc (digest_nsdmi_init): Fold non-ODR uses in NSDMIs.
Jakub Jelinek [Mon, 29 Dec 2025 13:00:02 +0000 (14:00 +0100)]
auto-profile.cc: Fix build with C++14
On Tue, Dec 23, 2025 at 11:01:36AM +0530, Dhruv Chawla wrote:
> Committed as:
> - r16-6347-g84058c3cc805f7
This broke building gcc with C++14 system compilers.
../../gcc/auto-profile.cc: In member function ‘std::pair<const char*, int> autofdo::string_table::get_original_name(const char*) const’:
../../gcc/auto-profile.cc:1129:7: warning: init-statement in selection statements only available with ‘-std=c++17’ or ‘-std=gnu++17’ [-Wc++17-extensions]
1129 | if (symtab_node *n
| ^~~~~~~~~~~
This is valid only in C++17 and later.
Fixed thusly.
2025-12-29 Jakub Jelinek <jakub@redhat.com>
* auto-profile.cc (string_table::get_original_name): Avoid using
init-statement in selection statement.
Rainer Orth [Mon, 29 Dec 2025 11:09:35 +0000 (12:09 +0100)]
build: Cherry-pick libtool.m4 support for GNU ld *_sol2 emulations
GNU ld gained separate Solaris-specific linker emulations (*_sol2) long
ago. Since their introduction, GCC has preferred them over their
non-*_sol2 counterparts but supported both forms. This has changed for
GCC 16: since all supported versions of GNU ld do support the *_sol2
emulations, GCC now uses them unconditionally.
libtool has also been updated to handle this since libtool 2.4.2 back in
2011. However, that change has only partially been backported to the
heavily patched libtool.m4 in the GCC tree: the sparcv9 part is there,
but the amd64 part is missing for some reason. This causes problems
with some recent binutils changes.
Therefore this patch cherry-picks the libtool patch to bring
Solaris/x86_64 in sync with Solaris/sparcv9 and upstream libtool.
Bootstrapped without regressions on {amd64,i386}-pc-solaris2.11 and
{sparcv9,sparc}-sun-solaris2.11.
Jose E. Marchesi [Mon, 29 Dec 2025 03:18:52 +0000 (04:18 +0100)]
a68: use LMD instead of LM for mode labels in exports
dwarf2out uses "LM" for line-info labels in text sections, using the
global counter line_info_label_num to get unique label names. The
Algol 68 exports were using the same string for the mode labels in the
.a68_exports sections using its own private counter. This led to
assemblers to not be happy whey they find duplicated labels in the
input assembly files.
This commit changes the names of mode labels in Algol 68 export
sections to use the "LMD" string instead.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-exports.cc (a68_asm_output_mode): Use .LMDnn labels for
modes instead of .LMnn.
Andrew Pinski [Fri, 26 Dec 2025 22:30:22 +0000 (14:30 -0800)]
LRA: Fix eliminate regs into a subreg inside a debug insn [PR123295]
So the problem here is during LRA we are eliminating argp and trying to
simplify the RTL as we go but inside a debug insn, almost all subreg
are valid due to gen_lowpart_for_debug done during debug insn simplification.
So simplify_gen_subreg will fail on some subregs and return null.
This causes problems later on. The solution is create a raw SUBREG
like what is done in lra_substitute_pseudo for debug insns.
Bootstrapped and tested on x86_64-linux-gnu.
PR rtl-optimization/123295
gcc/ChangeLog:
* lra-eliminations.cc (lra_eliminate_regs_1): For a debug
insn, create a raw SUBREG if simplify_gen_subreg fails.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123295-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jose E. Marchesi [Sun, 28 Dec 2025 19:05:50 +0000 (20:05 +0100)]
a68: document finding module exports in the manual
This commit contains some little updates on the ga68 user manual, on
the topic of modules, exports and the modules map.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* ga68.texi (Packets): Update to specify only particular programs
and prelude packets are currently supported.
(Writing modules): Few updates.
(Module activation): Likewise.
(Modules and exports): New section.
Jose E. Marchesi [Sun, 28 Dec 2025 00:35:35 +0000 (01:35 +0100)]
a68: fix deduplication of imported modes
The internal global list of modes maintained by the compiler should
not contain two modes that are equivalent. When importing module
interfaces, the modes in these interfaces should get deduplicated
before being "interned" in the compiler's modes list. This commit
fixes the deduplication to accommodate the fact that more than one
module interface may be read from a given packet (compilation unit)
and also that multiple interfaces may be imported indirectly via
publicized modules.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-imports.cc (a68_decode_modes): Do not deduplicate imported
modes here.
(a68_open_packet): Do it here.
* a68-parser-extract.cc (extract_revelation): Recurse into
publicized modules after interning modes in the current module,
not before.
gcc/testsuite/ChangeLog
* algol68/execute/modules/module23bar.a68: New test.
* algol68/execute/modules/module23foo.a68: Likewise.
* algol68/execute/modules/program-23.a68: Likewise.
Rainer Orth [Sun, 28 Dec 2025 10:32:37 +0000 (11:32 +0100)]
Support Solaris CTF generation
The Solaris Compact C Type Format, CTF, was introduced back in Solaris
9. It is the precursor to current GNU CTF, meant primarily for tools
like the low-level debugger mdb or DTrace that would like to avoid the
overhead of full DWARF-2 debugging information. However, for a long
time creation required separate steps to convert DWARF information
(version 2 only) to CTF in the input objects and later merge this into
the final objects. The tools to do so were available, but they were
barely documented and their use restricted to the core OS because of
this difficulty.
There's recently been a massive effort to simplify this and allow for
wider adoption. The native linker has been extended to take GNU CTF
info in the input objects and convert that to Solaris CTF itself. At
the same time, the massively enhanced tools and the format itself are
fully documented.
To make this even simpler to use, this patch introduces a new -gsctf
option to hide the details from users. At compile time, it just passes
-gctf to the compiler, and at link time it invokes ld with -z ctf.
Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.
Rainer Orth [Sun, 28 Dec 2025 10:08:07 +0000 (11:08 +0100)]
testsuite: i386: Fix up check-function-bodies tests
Several recent tests that use check-function-bodies on x86 FAIL on Solaris:
they all lack dg-add-options check_function_bodies which is required to
handle some Solaris differences. One test also needs -fomit-frame-pointer
to deal with a different Solaris/x86 default.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
Rainer Orth [Sun, 28 Dec 2025 09:43:04 +0000 (10:43 +0100)]
Don't check for -xbrace_comment with Solaris/x86 as
With Solaris/x86 as, GCC uses the -xbrace_comment option if supported.
Since it is present in the Solaris 11.4 FCS assembler and 11.4 is the
only supported Solaris version, this check is no longer necessary.
Bootstrapped without regressions on i386-pc-solaris2.11.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-parser-taxes.cc (tax_module_dec): Do not handle
DEFINING_MODULE_INDICANT.
* a68-exports.cc (a68_add_module_to_moif): Do not mangle module
names in module extracts.
(add_pub_revelations_to_moif): New function.
(a68_do_exports): Simplify and call add_pub_revelations_to_moif.
* a68-imports.cc (a68_decode_moifs): Add all decoded moifs to the
global list TOP_MOIF.
* a68-parser-extract.cc (extract_revelation): Recurse to import
extracts from publicized modules.
(a68_extract_indicants): Do not add symbol table entries for
defining modules.
* a68-types.h (struct TAG_T): Remove field EXPORTED.
(EXPORTED): Remove macro.
(TOP_MOIF): Define.
* a68-parser.cc (a68_parser): Initialize global list of moifs.
(a68_new_tag): Do not initialize EXPORTED.
gcc/testsuite/ChangeLog
* algol68/execute/modules/module22bar.a68: New test.
* algol68/execute/modules/module22foo.a68: Likewise.
* algol68/execute/modules/program-22.a68: Likewise.
* algol68/compile/modules/program-11.a68: Adjust test to
publicized modules.
* algol68/compile/modules/program-error-multiple-delaration-module-1.a68:
Likewise.
Jose E. Marchesi [Sat, 27 Dec 2025 17:42:38 +0000 (18:42 +0100)]
a68: remove coalesce_public_symbols shortcut
As planned, this commit removes a crude hack (the coalescing of 'pub'
symbols right after bottom-up parsing) that I introduced during the
initial implementation of modules. The goal was to get working
separated compilation as soon as possible. Now the rest of the
parser, and also the lowerer pass, is made to know about these 'pub'
symbols.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-parser-bottom-up.cc (a68_bottom_up_error_check): Do not
check for the absence of public-symbols.
(a68_bottom_up_coalesce_pub): Removed function.
* a68-parser.cc (a68_parser): Do not call
a68_bottom_up_coalesce_pub
* a68-parser-extract.cc (a68_extract_indicants): Adapt to the
presence of public-symbols.
* a68-parser-modes.cc (get_mode_from_proc_variables): Likewise.
* a68-parser-taxes.cc (tax_variable_dec): Likewise.
(tax_proc_variable_dec): Likewise.
(tax_op_dec): Likewise
(tax_prio_dec): Likewise.
* a68-low-decls.cc (a68_lower_mode_declaration): Adapt to the
presence of public-symbols.
(a68_lower_variable_declaration): Likewise.
(a68_lower_identity_declaration): Likewise.
(a68_lower_procedure_declaration): Likewise.
(a68_lower_procedure_variable_declaration): Likewise.
(a68_lower_brief_operator_declaration): Likewise.
(a68_lower_operator_declaration): Likewise.
gcc/testsuite/ChangeLog
* algol68/compile/module-2.a68: Expand test a little.
Jose E. Marchesi [Sat, 27 Dec 2025 10:09:04 +0000 (11:09 +0100)]
a68: allow joined list of revelations in access clauses
This commit adds support for having a joined list of revelations in
access clauses, like in:
access Module18a,
Module18b,
Module18c
begin assert (foo = 10);
assert (bar = 20);
assert (baz = 30)
end
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-parser-bottom-up.cc (reduce_enclosed_clauses): Reduce joined
list of revelations.
* a68-low-clauses.cc (a68_lower_revelation_ludes): New function.
(a68_lower_access_clause): Use a68_lower_revelation_ludes.
Jose E. Marchesi [Tue, 23 Dec 2025 14:53:36 +0000 (15:53 +0100)]
a68: fix support for nested access clauses
This commit fixes the support for having an access clause as the
controlled clause of another access clause.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-parser-top-down.cc (top_down_access): An access clause may
be nested in another access clause.
* a68-parser-extract.cc (a68_extract_indicants): Coalesce 'pub'
symbols.
(a68_extract_indicants): Nested access are not allowed in module
texts.
* a68-parser-bottom-up.cc (expected_module_text): New function.
(reduce_prelude_packet): Use expected_module_text.
(a68_bottom_up_error_check): Add comment.
gcc/testsuite/ChangeLog
* algol68/compile/error-module-nested-access-1.a68: New test.
* algol68/execute/modules/program-21.a68: Likewise.
Jose E. Marchesi [Mon, 22 Dec 2025 23:52:52 +0000 (00:52 +0100)]
a68: fetch module exports from packet by name
A packet (compilation unit) may emit more than one module interface in
its exports section. This is because a module may publicize the
exports of other module. This commit makes the import infrastructure
to read multiple module interfaces from exports sections and then look
for the accessed module in the data.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-types.h (struct MOIF_T): Add chain_next to GTY info.
* a68-imports.cc (a68_decode_modes): Mode offsets are relative to
the start of the moif, not the start of the exports.
(a68_decode_moifs): Renamed from a68_decode_moif and changed to
decode multiple moifs from the exports.
(a68_open_packet): Call a68_decode_moifs and look for the right
moif.
* a68-exports.cc (a68_moif_new): Initialize NEXT (moif).
Jakub Jelinek [Sat, 27 Dec 2025 10:45:18 +0000 (11:45 +0100)]
simplify-rtx: Fix up (ne (ior (ne x 0) y) 0) simplification [PR123114]
The following testcase ICEs on x86_64-linux since the PR52345
(ne (ior (ne x 0) y) 0) simplification was (slightly) fixed.
It wants to optimize
(set (reg/i:DI 10 a0)
(ne:DI (ior:DI (ne:DI (reg:DI 151 [ a ])
(const_int 0 [0]))
(reg:DI 152 [ b ]))
(const_int 0 [0])))
but doesn't check important property of that, in particular
that the mode of the inner NE operand is the same as the
mode of the inner NE.
The following testcase has
(set (reg:CCZ 17 flags)
(compare:CCZ (ior:QI (ne:QI (reg/v:SI 104 [ c ])
(const_int 0 [0]))
(reg:QI 98 [ _5 ]))
(const_int 0 [0])))
where cmp_mode is QImode, but the mode of the inner NE operand
is SImode instead, and it attempts to create
(ne:CCZ (ior:QI (reg/v:SI 104 [ c ]) (reg:QI 98 [ _5 ])) (const_int 0))
which obviously crashes later on.
The following patch fixes it by checking the mode of the inner NE operand
and also by using CONST0_RTX (cmp_mode) instead of CONST0_RTX (mode)
because that is the mode of the other operand, not mode which is the
mode of the outer comparison (though, guess for most modes it will still
be const0_rtx).
I guess for mode mismatches we could arbitrarily choose some extension (zero
or sign) and extend the narrower mode to the wider mode, but I doubt that it
would ever match on any target. But even then we'd need to limit it, we
wouldn't want to deal with another mode class (say floating point
comparisons), and dunno about vector modes etc.
2025-12-27 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/123114
* simplify-rtx.cc (simplify_context::simplify_relational_operation):
Verify XEXP (XEXP (op0, 0), 0) mode and use CONST0_RTX (cmp_mode)
instead of CONST0_RTX (mode).
Eric Botcazou [Sat, 27 Dec 2025 09:24:52 +0000 (10:24 +0100)]
Ada: Fix assertion failure for unfrozen mutably tagged type as actual
...in instance. An instantiation is a freezing point for the actuals,
so the mutably tagged type will be frozen by the instantiation, but this
happens too late in the current implementation of mutably tagged types,
because the declaration of their CW-equivalent type is not analyzed until
after the type is frozen.
gcc/ada/
PR ada/123306
* sem_ch12.adb (Analyze_One_Association): Immediately freeze the
root type of mutably tagged types used as actual type parameters.
gcc/testsuite/
* gnat.dg/specs/mutably_tagged1.ads: New test.
Jeff Law [Fri, 26 Dec 2025 22:24:56 +0000 (15:24 -0700)]
[RISC-V][PR target/123283] Wrap naked REG operands with a USE.
I was in the process of testing this patch when Andreas filed PR123283.
What's going on is we have patterns in sync.md which have naked operands:
(define_insn "subword_atomic_fetch_strong_<atomic_optab>"
[(set (match_operand:SI 0 "register_operand" "=&r") ;; old value at mem
(match_operand:SI 1 "memory_operand" "+A")) ;; mem location
(set (match_dup 1)
(unspec_volatile:SI
[(any_atomic:SI (match_dup 1)
(match_operand:SI 2 "arith_operand" "rI")) ;; value for op
(match_operand:SI 3 "const_int_operand")] ;; model
UNSPEC_SYNC_OLD_OP_SUBWORD))
(match_operand:SI 4 "arith_operand" "rI") ;; mask
(match_operand:SI 5 "arith_operand" "rI") ;; not_mask
(clobber (match_scratch:SI 6 "=&r")) ;; tmp_1
(clobber (match_scratch:SI 7 "=&r"))] ;; tmp_2
Note carefully operands #4 and #5 and the fact they are a toplevel construct as
opposed to being an operand of another RTX. That's a no-no. They need to be
wrapped with a USE.
I spot-checked sync.md and found a few more instances. Fixing the set I found
fixed the testsuite regressions I was seeing and also fixes the mis-compilation
of libgo. Bootstrapped and regression tested on my BPI and Pioneer. It's also
clean on the riscv64-elf and riscv32-elf targets in my tester.
PR target/123283
gcc/
* config/riscv/sync.md (subword_atomic_fetch_strong_nand): Add
USEs for naked operands that might be pseudos.
(subword_atomic_fetch_strong_<atomic_optab>): Likewise.
(subword_atomic_exchange_strong): Likewise.
(subword_atomic_cas_strong): Likewise.
Eric Botcazou [Fri, 26 Dec 2025 13:52:32 +0000 (14:52 +0100)]
Ada: Fix bogus error on aggregate in call with qualified type in instance
This happens with a container aggregate in the testcase, although this can
very likely happen with a record aggregate as well. The trick used in the
Save_Global_References procedure for aggregates loses the qualification of
the type of the formal for which the aggregate is the actual.
gcc/ada/
PR ada/123302
* sem_ch12.adb (Save_Global_Reference.Save_References_In_Aggregate):
Recurse on the scope of the type to find one that is visible, in the
case of an actual in a subprogram call with a local type.
gcc/testsuite/
* gnat.dg/aggr34.adb: New test.
* gnat.dg/aggr34_pkg1.ads, gnat.dg/aggr34_pkg1.adb: New helper.
* gnat.dg/aggr34_pkg2.ads, gnat.dg/aggr34_pkg2.adb: Likewise.
* gnat.dg/aggr34_pkg3.ads: Likewise.
Egas Ribeiro [Mon, 22 Dec 2025 21:41:00 +0000 (21:41 +0000)]
c-family: Fix ICE with -MD and -fdeps-format sharing output [PR121864]
When -MD, -fdeps-format=p1689r5 and -save-temps are used without
explicit output files, they default to the same stream, which is
invalid. The error message attempted to print fdeps_file, but this is
NULL in this case, causing an ICE.
Use out_fname as a fallback when fdeps_file is NULL to avoid the ICE
and provide a meaningful error message.
Fix suggested by Andrew Pinski.
PR c++/121864
gcc/c-family/ChangeLog:
* c-opts.cc (c_common_finish): Use out_fname as fallback when
fdeps_file is NULL in error message.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Eric Botcazou [Fri, 26 Dec 2025 09:44:57 +0000 (10:44 +0100)]
Ada: Fix illegal Aggregate aspect not rejected
The Ada 2022 RM is adamant that the names specified in the Aggregate aspect
must denote "exactly one" subprogram, in other words that it is illegal to
use names that denote more than one subprogram in the Aggregate aspect.
gcc/ada/
PR ada/123289
* sem_ch13.adb (Resolve_Aspect_Aggregate.Resolve_Operation): Give
an error if the operation's name denotes more than one subprogram.
gcc/testsuite/
* gnat.dg/specs/aggr9.ads: New test.
Sandra Loosemore [Sun, 21 Dec 2025 02:23:43 +0000 (02:23 +0000)]
doc, riscv: Clean up RISC-V extensions documentation
This patch fixes a number of problems I observed in the RISC-V
extensions documentation, which is autogenerated from .def files:
- The formatting of the table looked terrible in the PDF output, with
overlapping text. I made the first two columns wider to fix this.
- Also the extension names in the table should have @samp{} markup.
- Many extensions were missing a full name/description. (Documenting
something as "xyzzy extension" adds nothing useful to readers when we
are already listing the extension name "xyzzy" in the table.)
- Irregular spelling and capitalization in the full names.
Sandra Loosemore [Sun, 14 Dec 2025 00:38:48 +0000 (00:38 +0000)]
doc, riscv: Clean up documentation of RISC-V options [PR122243]
gcc/ChangeLog
PR other/122243
* config/riscv/riscv.opt (mplt): Mark deprecated option Undocumented.
(msmall-data-limit=): Mark RejectNegative.
* doc/invoke.texi (Option Summary) <RISC-V Options>: Remove -mplt
documentation. Only list one form of each option. Add missing
options -mcpu, -mscalar-strict-align, -mno-vector-strict-align,
-momit-leaf-frame-pointer, -mstringop-strategy, -mrvv-vector-bits,
-mrvv-max-lmul, -madjust-lmul-cost, -mmax-vectorization, and
-mno-autovec-segment.
(RISC-V Options): Remove -mplt documentation. Add documentation for
missing options listed above. Add missing index entries for negative
forms. Correct the default for the -minline-str* options, which
has changed. Copy-edit for markup, spelling, and usage. Trivial
whitespace fixes.
Then, sincos attempts to find the type of the IFN_SIN/IFN_COS via
mathfn_built_in_type. This fails, so the compiler crashes.
For these IFNs, their input type is the same as their output type, so
we can fall back to that.
Note that, currently, GCC can't seem to handle vector sincos/cexpi
operations, so any attempt to CSE these will fail quickly after. This
patch does not fix that, only the ICE that happens in the attempt.
gcc/ChangeLog:
* tree-ssa-math-opts.cc (execute_cse_sincos_1): If
mathfn_built_in_type fails to determine a type for our
operation, presume that it is the same as the input type.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/sincos-ice-on-ifn_sin-call.c: New test.
* gcc.target/gcn/sincos-ice-on-ifn_sin-call-1.c: New test.
Pan Li [Sun, 21 Dec 2025 12:07:43 +0000 (20:07 +0800)]
RISC-V: Combine vec_duplicate + vmsleu.vv to vmsleu.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vmsleu.wv to the
vmsleu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have asm code like below, GR2VR cost is 0.
After this patch:
11 beq a3,zero,.L8
...
14 .L3:
15 vsetvli a5,a3,e32,m1,ta,ma
...
20 vmsleu.wx v1,a2,v3
...
23 bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/predicates.md: Add geu to the swappable
cmp operator iterator.
* config/riscv/riscv-v.cc (get_swapped_cmp_rtx_code): Take
care of the swapped rtx code correspondly.
aarch64: Add the ability to have three types in an sve/sme intrinsic name
The majority of sve/sme intrinsics have names which are defined by one type
(like svuint8_t svextq[_u8]) or two types (like svsub_za32[_f32]_vg1x2).
Some intrinsics now have three types (like svtmopa_lane_za32[_s8_u8]).
This change extends the number of type_suffix_indexes from two to three
to cover this case.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc: (svmul_impl::fold):
Replace use of type_suffix_pair with type_suffix_triple.
* config/aarch64/aarch64-sve-builtins-shapes.cc: (parse_element_type):
Handle third type suffix.
(parse_type): Handle c2 in function signature. Add the u signature with
the ability to pass a tuple with twice as many vectors as the base type.
Calculate number of vectors against the type with the maximum number of
bits rather than "the other one".
(load_contiguous_base::resolve): Add argument to resolve_to call.
(compare_scalar_def::resolve): Likewise.
(ternary_mfloat8_def::resolve): Likewise.
(ternary_mfloat8_lane_def::resolve): Likewise.
(ternary_mfloat8_opt_n_def::resolve): Likewise.
* config/aarch64/aarch64-sve-builtins.cc: (TYPES_all_pred,
TYPES_all_count, TYPES_all_pred_count, TYPES_all_float,
TYPES_all_signed, TYPES_all_float_and_signed, TYPES_all_unsigned,
TYPES_all_integer, TYPES_all_arith, TYPES_all_data, TYPES_b, TYPES_c,
TYPES_b_unsigned, TYPES_b_integer, TYPES_b_data, TYPES_bh_integer,
TYPES_bs_unsigned, TYPES_bhs_signed, TYPES_bhs_unsigned,
TYPES_bhs_integer, TYPES_bh_data, TYPES_bhs_data, TYPES_bhs_widen,
TYPES_h_bfloat, TYPES_h_float, TYPES_h_integer, TYPES_h_data,
TYPES_hs_signed, TYPES_hs_integer, TYPES_hs_float, TYPES_hs_data,
TYPES_hd_unsigned, TYPES_hsd_signed, TYPES_hsd_integer, TYPES_hsd_data,
TYPES_h_float_mf8, TYPES_s_float, TYPES_s_float_mf8,
TYPES_s_float_hsd_integer, TYPES_s_float_sd_integer, TYPES_s_signed,
TYPES_s_unsigned, TYPES_s_integer, TYPES_s_data, TYPES_sd_signed,
TYPES_sd_unsigned, TYPES_sd_integer, TYPES_sd_data,
TYPES_all_float_and_sd_integer, TYPES_d_float, TYPES_d_unsigned,
TYPES_d_integer, TYPES_d_data, TYPES_cvt, TYPES_cvt_bfloat,
TYPES_cvt_h_s_float, TYPES_cvt_f32_f16, TYPES_cvt_long,
TYPES_cvt_narrow_s, TYPES_cvt_narrow, TYPES_cvt_s_s, TYPES_cvt_mf8,
TYPES_cvtn_mf8, TYPES_cvtnx_mf8, TYPES_inc_dec_n, TYPES_qcvt_x2,
TYPES_qcvt_x4, TYPES_qrshr_x2,TYPES_qrshru_x2, TYPES_qrshr_x4,
TYPES_qrshru_x4, TYPES_reinterpret, TYPES_reinterpret_b,TYPES_while,
TYPES_while_x, TYPES_while_x_c, TYPES_s_narrow_fsu,TYPES_all_za,
TYPES_d_za, TYPES_za_bhsd_data,TYPES_za_all_data, TYPES_za_h_mf8,
TYPES_za_hs_mf8, TYPES_za_h_bfloat, TYPES_za_h_float,
TYPES_za_s_b_signed, TYPES_za_s_b_unsigned, TYPES_za_s_b_integer,
TYPES_za_s_h_integer,TYPES_za_s_h_data, TYPES_za_s_unsigned,
TYPES_za_s_integer, TYPES_za_s_mf8, TYPES_za_s_float, TYPES_za_s_data,
TYPES_za_d_h_integer, TYPES_za_d_float, TYPES_za_d_integer,
TYPES_mop_base, TYPES_mop_base_signed, TYPES_mop_base_unsigned,
TYPES_mop_i16i64, TYPES_mop_i16i64_signed, TYPES_mop_i16i64_unsigned,
ΤYPES_za): Extend defines to three arguments.
(DEF_VECTOR_TYPE, DEF_DOUBLE_TYPE): Likewise.
(DEF_TRIPLE_TYPE): Add new define.
(DEF_SVE_TYPES_ARRAY): Redefine all types_ arrays into arrays of
type_suffix_triple.
(types_none): Likewise.
(function_instance::hash): Add third type to hash calculation.
(function_builder::get_name): Add third type to function name.
(function_builder::add_overloaded_functions): Handle third type.
(function_resolver::lookup_form): Likewise.
(function_resolver::resolve_to): Likewise.
(function_resolver::resolve_unary): Likewise.
* config/aarch64/aarch64-sve-builtins.h: (type_suffix_triple): replace
type_suffix_pair.
(function_group_info::types): Likewise.
(function_instance::ctor): Likewise.
(function_instance::type_suffix_ids): Likewise.
(function_resolver::lookup_form): Add third type argument.
(function_resolver::resolve_to): Likewise.
(function_instance::operator==): Add third type to equality calculation.
Karl Meakin [Wed, 24 Dec 2025 11:41:27 +0000 (11:41 +0000)]
aarch64: add 8-bit floating point dot product
This patch adds support for the following intrinsics when sme-f8f16 is enabled:
* svdot_za16[_mf8]_vg1x2_fpm
* svdot_za16[_mf8]_vg1x4_fpm
* svdot[_single]_za16[_mf8]_vg1x2_fpm
* svdot[_single]_za16[_mf8]_vg1x4_fpm
* svdot_lane_za16[_mf8]_vg1x2_fpm
* svdot_lane_za16[_mf8]_vg1x4_fpm
This patch adds support for the following intrinsics when sme-f8f32 is enabled:
* svdot_za32[_mf8]_vg1x2_fpm
* svdot_za32[_mf8]_vg1x4_fpm
* svdot[_single]_za32[_mf8]_vg1x2_fpm
* svdot[_single]_za32[_mf8]_vg1x4_fpm
* svdot_lane_za32[_mf8]_vg1x2_fpm
* svdot_lane_za32[_mf8]_vg1x4_fpm
* svvdot_lane_za32[_mf8]_vg1x2_fpm
* svvdotb_lane_za32[_mf8]_vg1x4_fpm
* svvdott_lane_za32[_mf8]_vg1x4_fpm
gcc:
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x24:mode>): New insn.
(@aarch64_fvdot_half<optab>): Likewise.
(@aarch64_fvdot_half<optab>_plus): Likewise.
* config/aarch64/aarch64-sve-builtins-functions.h
(class svvdot_half_impl): New function impl.
* config/aarch64/aarch64-sve-builtins-sme.cc (FUNCTION): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc (struct dot_half_za_slice_lane_def):
New function shape.
* config/aarch64/aarch64-sve-builtins-shapes.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def (svdot): New function.
(svdot_lane): Likewise.
(svvdot_lane): Likewise.
(svvdotb_lane): Likewise.
(svvdott_lane): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.h (svvdotb_lane_za): New function.
(svvdott_lane_za): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_za_s_mf8): New types array.
(TYPES_za_hs_mf8): Likewise.
(za_hs_mf8): Likewise.
* config/aarch64/iterators.md (SME_ZA_F8F16): New mode iterator.
(SME_ZA_F8F32): Likewise.
(SME_ZA_FP8_x1): Likewise.
(SME_ZA_FP8_x2): Likewise.
(SME_ZA_FP8_x4): Likewise.
(UNSPEC_SME_FDOT_FP8): New unspec.
(UNSPEC_SME_FVDOT_FP8): Likewise.
(UNSPEC_SME_FVDOTT_FP8): Likewise.
(UNSPEC_SME_FVDOTB_FP8): Likewise.
(SME_FP8_DOTPROD): New int iterator.
(SME_FP8_FVDOT): Likewise.
(SME_FP8_FVDOT_HALF): Likewise.
gcc/testsuite:
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za16_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_lane_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za16_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_single_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za16_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/dot_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/vdot_lane_za16_mf8_vg1x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/vdotb_lane_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/vdott_lane_za32_mf8_vg1x4.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dot_half_za_slice_lane_fpm.c: New test.
aarch64: add 8-bit floating-point sum of outer products and accumulate
This patch adds support for FMOPA (widening, 2-way, FP8 to FP16) when
sme-f8f16 is enabled using svmopa_za16[_mf8]_m_fpm and for FMOPA (widening,
4-way) when sme-f8f32 is enabled using svmopa_za32[_mf8]_m_fpm.
Asm tests for the new intrinsics are added, similar to those for existing
mopa_z16 intrinsics. Tests for the binary_za_m shape are added.
gcc:
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><VNx16QI_ONLY:mode>): Add
new define_insn.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct binary_za_m_base): Support fpm argument.
* config/aarch64/aarch64-sve-builtins-sme.cc (svmopa_za): Extend for
fp8.
* config/aarch64/aarch64-sve-builtins-sme.def (svmopa): Add new
DEF_SME_ZA_FUNCTION_GS_FPM entries.
aarch64: add Multi-vector 8-bit floating-point multiply-add long
This patch adds support for the following intrinsics when sme-f8f16 is enabled:
* svmla_lane_za16[_mf8]_vg2x1_fpm
* svmla_lane_za16[_mf8]_vg2x2_fpm
* svmla_lane_za16[_mf8]_vg2x4_fpm
* svmla_za16[_mf8]_vg2x1_fpm
* svmla[_single]_za16[_mf8]_vg2x2_fpm
* svmla[_single]_za16[_mf8]_vg2x4_fpm
* svmla_za16[_mf8]_vg2x2_fpm
* svmla_za16[_mf8]_vg2x4_fpm
This patch adds support for the following intrinsics when sme-f8f32 is enabled:
* svmla_lane_za32[_mf8]_vg4x1_fpm
* svmla_lane_za32[_mf8]_vg4x2_fpm
* svmla_lane_za32[_mf8]_vg4x4_fpm
* svmla_za32[_mf8]_vg4x1_fpm
* svmla[_single]_za32[_mf8]_vg4x2_fpm
* svmla[_single]_za32[_mf8]_vg4x4_fpm
* svmla_za32[_mf8]_vg4x2_fpm
* svmla_za32[_mf8]_vg4x4_fpm
Asm tests for the 32 bit versions follow the blueprint set in
mla_lane_za32_u8_vg4x1.c mla_za32_u8_vg4x1.c and similar.
16 bit versions follow similar patterns modulo differences in allowed offsets.
gcc:
* config/aarch64/aarch64-sme.md
(@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x24:mode>): Add
new define_insn.
(*aarch64_sme_<optab><VNx8HI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
*aarch64_sme_<optab><VNx4SI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
@aarch64_sme_<optab><SME_ZA_F8F16_32:mode><VNx16QI_ONLY:mode>,
*aarch64_sme_<optab><VNx8HI_ONLY:mode><VNx16QI_ONLY:mode>_plus,
*aarch64_sme_<optab><VNx4SI_ONLY:mode><VNx16QI_ONLY:mode>_plus,
@aarch64_sme_single_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x24:mode>,
*aarch64_sme_single_<optab><VNx8HI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
*aarch64_sme_single_<optab><VNx4SI_ONLY:mode><SME_ZA_FP8_x24:mode>_plus,
@aarch64_sme_lane_<optab><SME_ZA_F8F16_32:mode><SME_ZA_FP8_x124:mode>,
*aarch64_sme_lane_<optab><VNx8HI_ONLY:mode><SME_ZA_FP8_x124:mode>,
*aarch64_sme_lane_<optab><VNx4SI_ONLY:mode><SME_ZA_FP8_x124:mode>):
Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct binary_za_slice_lane_base): Support fpm argument.
(struct binary_za_slice_opt_single_base): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc (svmla_za): Extend for fp8.
(svmla_lane_za): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def (svmla_lane): Add new
DEF_SME_ZA_FUNCTION_GS_FPM entries.
(svmla): Likewise.
* config/aarch64/iterators.md (SME_ZA_F8F16_32): Add new mode iterator.
(SME_ZA_FP8_x24, SME_ZA_FP8_x124): Likewise.
(UNSPEC_SME_FMLAL): Add new unspec.
(za16_offset_range): Add new mode_attr.
(za16_32_long): Likewise.
(za16_32_last_offset): Likewise.
(SME_FP8_TERNARY_SLICE): Add new iterator.
(optab): Add entry for UNSPEC_SME_FMLAL.
gcc/testsuite:
* gcc.target/aarch64/sme2/acle-asm/test_sme2_acle.h: (TEST_ZA_X1,
TEST_ZA_XN, TEST_ZA_SINGLE, TEST_ZA_SINGLE_Z15, TEST_ZA_LANE,
TEST_ZA_LANE_Z15): Add fpm0 parameter.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c: Add
tests for variants accepting fpm.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za16_mf8_vg2x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_lane_za32_mf8_vg4x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za16_mf8_vg2x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x1.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/mla_za32_mf8_vg4x4.c: New test.
aarch64: add basic support for sme-f8f16 and sme-f8f32
This patch adds support for the SME_F8F16 and SME_F8F32 features as architecture
options, along with related definitions. This support is required for subsequent
intrinsics to work.
gcc/
* config/aarch64/aarch64.h:
(TARGET_STREAMING_SME_F8F16, TARGET_STREAMING_SME_F8F32): Add defines.
* config/aarch64/aarch64-c.cc:
(__ARM_FEATURE_SME_F8F16, __ARM_FEATURE_SME_F8F32): Add defines.
* config/aarch64/aarch64-option-extensions.def:
(sme-f8f16, sme-f8f32): Add arch options in command line.
* config/aarch64/aarch64-sve-builtins-functions.h:
(sme_2mode_function_t): Pass unspec_for_mfp8 parameter through ctor.
* config/aarch64/aarch64-sve-builtins-sme.def:
(DEF_SME_FUNCTION_GS, DEF_SME_FUNCTION): Redefine based on
DEF_SME_FUNCTION_GS_FPM.
(DEF_SME_ZA_FUNCTION_GS, DEF_SME_ZA_FUNCTION): Redefine based on
DEF_SME_ZA_FUNCTION_GS_FPM.
(AARCH64_FL_SME_F8F16, AARCH64_FL_SME_F8F32): Add new
REQUIRED_EXTENSIONS sections.
* config/aarch64/aarch64-sve-builtins.cc:
(TYPES_za_h_mf8): Add new types.
(TYPES_za_s_mf8): Likewise.
(sme_function_groups): Define using DEF_SME_FUNCTION_GS_FPM instead of
DEF_SME_FUNCTION_GS.
* doc/invoke.texi: (sme-f8f16, sme-f8f32): Add documentation of option.
gcc/testsuite/
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Add tests checking that
sme-f8f16 and sme-f8f32 prefefs are off by default, and checks for
feature dependencies.
* lib/target-supports.exp: Add check_effective_target support for
sme-f8f16 and sme-f8f32.
Test structure is based on the urshl ones that have a similar structure in how
they treat arguments.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svscale_impl): Added new
class for dealing with all svscale functions (including sve)
(svscale): updated FUNCTION macro call to make use of new class.
* config/aarch64/aarch64-sve-builtins-sve2.def: (svscale):
Added new DEF_SVE_FUNCTION_GS call to enable recognition of new variant.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_fscale<mode>): Added
new define_insn. (@aarch64_sve_single_fscale<mode>): Likewise.
* config/aarch64/iterators.md: (SVE_Fx24_NOBF): Added new iterator,
similar to SVE_Fx24 but without brainfloat.
(SVE_Fx24): Updated to make use of SVE_Fx24_NOBF.
(SVSCALE_SINGLE_INTARG): Added new mode_attr.
(SVSCALE_INTARG): Likewise.
This patch adds the following intrinsics (all __arm_streaming only) along with
asm tests for them under the +sme2+fp8 flags:
- svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt1_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvt2_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl1_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
- svfloat16x2_t svcvtl2_bf16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm)
gcc/
* config/aarch64/aarch64-sve-builtins-sve2.cc (svcvtl1, svcvtl2): Added
new FUNTIONs.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtl1, svcvtl2): Added new DEF_SVE_FUNCTION_GS_FPM.
* config/aarch64/aarch64-sve-builtins-sve2.h (svcvtl1, svcvtl2): Added
new function_base.
* config/aarch64/aarch64-sve-builtins.cc
(function_resolver::resolve_unary): use group_suffix_id when resolving
C overloads.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve2_fp8_cvt_<fp8_cvt_uns_op><mode>): Added new define_insn.
* config/aarch64/aarch64.h (TARGET_SSME2_FP8): Added new define.
* config/aarch64/iterators.md
(UNSPEC_F1CVTL. UNSPEC_F2CVTL): Added new unspecs.
(FP8CVT_UNS): Extended int_iterator.
(fp8_cvt_uns_op): Likewise.
gcc/testsuite/
* g++.target/aarch64/sme2/aarch64-sme2-acle-asm.exp: Use tuning flag
to reduce churn in testsuites.
* gcc.target/aarch64/sme2/aarch64-sme2-acle-asm.exp: Likewise.
* gcc.target/aarch64/sme2/acle-asm/cvt_mf8_x2.c: Added test file.
* gcc.target/aarch64/sme2/acle-asm/cvtl_mf8_x2.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_X2_WIDE): Added
fpm0 argument for intrinsics.
In a GCC configuration with both AMD and NVIDIA GPU code offloading supported,
and the selected AMD GPU code generation not supporting USM, but an USM-capable
NVIDIA GPU available, I see all test cases that require effective-target
'omp_usm' turn UNSUPPORTED, because:
Executing on host: gcc usm_available_2778376.c [...]
[...]
In function 'main._omp_fn.0':
lto1: warning: Unified Shared Memory is required, but XNACK is disabled
lto1: note: Try -foffload-options=-mxnack=any
gcn mkoffload: warning: conflicting settings; XNACK is forced off but Unified Shared Memory is required
UNSUPPORTED: [...]
That warning is, however, not relevant in the scenario described above: we're
not going to exercise AMD GPU code offloading at run time.
With the effective-target 'omp_usm' check robustified like this, the affected
test cases are then no longer UNSUPPORTED, but of course, there's then the
corollary issue that compilation of the test case itself now emits the very
same warning, which results in the "test for excess errors" FAILing, despite
the execution test PASSing, for example:
FAIL: libgomp.c++/target-std__valarray-concurrent-usm.C (test for excess errors)
PASS: libgomp.c++/target-std__valarray-concurrent-usm.C execution test
That's clearly not ideal either (but is representative of what real-world usage
would run into), but is certainly better than the whole test case turning
UNSUPPORTED. To be continued, I guess...
Andrew Pinski [Tue, 23 Dec 2025 21:30:00 +0000 (13:30 -0800)]
ifcvt: Move noce_try_cond_zero_arith last
I noticed that on x86_64 and aarch64, noce_try_cond_zero_arith
would produce worse code than noce_try_cmove_arith.
So we should do noce_try_cond_zero_arith last instead
of before noce_try_cmove_arith.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
Also checked to make sure riscv testcases still work.
gcc/ChangeLog:
* ifcvt.cc (noce_process_if_block): Move noce_try_cond_zero_arith
last.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 23 Dec 2025 21:04:28 +0000 (13:04 -0800)]
ifcvt: Only allow scalar integral modes for noce_try_cond_zero_arith [PR123276]
This is the simple fix for PR 123276 where this code can only handle scalar
integral modes. We could in theory handle scalar floating point modes here
too but it is not worth the trouble.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/123276
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Reject non-scalar integral modes.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Nathaniel Shead [Sun, 7 Dec 2025 12:17:15 +0000 (23:17 +1100)]
c++: Non-inline temploid friends should still be COMDAT [PR122819]
Modules allow temploid friends to no longer be implicitly inline, as
functions defined in a class body will not be implicitly inline if
attached to a named module.
This requires us to clean up linkage handling a little bit, mostly by
replacing usages of 'DECL_TEMPLATE_INSTANTIATION' with
'DECL_TEMPLOID_INSTANTIATION' when determining if an entity has vague
linkage.
This caused the friend88.C testcase to miscompile however, as 'foo' was
incorrectly having 'DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION' getting
set because it was keeping its tinfo.
This is because 'non_templated_friend_p' was returning 'false', since
the function didn't have a primary template. But that's expected I
think here, so fixed by also returning true for friend declarations
pushed into namespace scope, which still allows dependent nested friends
to be considered templated.
PR c++/122819
gcc/cp/ChangeLog:
* decl.cc (start_preparsed_function): Use
DECL_TEMPLOID_INSTANTIATION instead of
DECL_TEMPLATE_INSTANTIATION to check vague linkage.
* decl2.cc (vague_linkage_p): Likewise.
(c_parse_final_cleanups): Simplify condition.
* pt.cc (non_templated_friend_p): Namespace-scope friend
function declarations without a primary template are still
non-templated.
* semantics.cc (expand_or_defer_fn_1): Also check for temploid
friend functions.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-22.C: New test.
* g++.dg/template/friend88.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
* a68.h (a68_file_size): Changed to use file descriptor.
(a68_file_read): Likewise.
* a68-parser-scanner.cc (a68_file_size): Likewise.
(a68_file_read): Likewise.
(read_source_file): Adapt `a68_file_{size,read}'.
(include_files): Likewise.
* a68-lang.cc (a68_handle_option): Likewise.
* a68-imports.cc (a68_find_export_data): Implement
reading from module's .m68 file if available.
gcc/testsuite/ChangeLog
* algol68/compile/modules/compile.exp (dg-data): New procedure
for writing binary test data to disk.
* algol68/compile/modules/program-m68-lp64.a68: New test which
embeds binary module data.
* algol68/compile/modules/program-m68-llp64.a68: Likewise.
* algol68/compile/modules/program-m68-ilp32.a68: Likewise.
* algol68/compile/modules/program-m68-lp64-be.a68: Likewise.
* algol68/compile/modules/program-m68-llp64-be.a68: Likewise.
Jeff Law [Tue, 23 Dec 2025 20:25:47 +0000 (13:25 -0700)]
[committed][RISC-V][PR target/123274] Add missing condition in usmul<mode>3 pattern
As Andrew P. noted in the BZ, the expander is missing elements in its condition
leading to generation of an insn that can't be matched.
This adds the necessary condition to the usmul<mode>3 expander which in turn
fixes the ICE. I just checked and that expander wansn't in gcc-15, so this is
just a gcc-16 issue.
Tested on riscv32-elf and riscv64-elf. I have a bootstrap in flight on the
Pioneer, but I'm not expecting any surprises. Much like the patch earlier
today, I'm going to push this now rather than wait for pre-commit CI.
Jeff Law [Tue, 23 Dec 2025 19:34:44 +0000 (12:34 -0700)]
[RISC-V][PR target/123278] Handle BF/HF modes in Andes 45 series pipeline description
So a standard run-of-the-mill case where we're testing modes to determine what
reservation to use in a pipeline model and modes were missing (BF/HF in this
case).
This adds the BF/HF cases to the fp_alu_s, fpu_mul_s and fpu_mac_s units for
the Andes 45 series. It may ultimately be the case that even lower latencies
are available for these ops, but that's something folks with a better
understanding of the Andes 45 series uarch would need to tackle.
Tested on riscv32-elf and riscv64-elf. Given the nature of the change and the
fact that I expect to be out of the office most of the next few days, I'm going
to go ahead and push without waiting for pre-commit CI. There's minimal risk.
Milan Tripkovic [Tue, 23 Dec 2025 16:39:41 +0000 (09:39 -0700)]
[RISC-V][PATCH] Adjust clmul latency in Spacemit X60 scheduler model
This patch adjusts the instruction scheduling and cost model for the Zbc
(CLMUL) extension on the Spacemit X60 core.
The tuning was evaluated using three configurations (CLMUL2, CLMUL3,
and the baseline CLMUL5) across a variety of hashing and encryption kernels.
Yuao Ma [Tue, 23 Dec 2025 14:54:34 +0000 (22:54 +0800)]
c++: clarify the comment regarding where the default dialect is set
Since r6-7026-g268be88cbeaba7, the default dialect has been set in
c_common_init_options rather than c_common_post_options. This patch updates the
corresponding comment to reflect that change.
Egas Ribeiro [Fri, 19 Dec 2025 21:34:55 +0000 (21:34 +0000)]
c++: Fix member-like friend detection for non-template classes [PR122550]
member_like_constrained_friend_p was incorrectly returning true for
constrained friend function templates declared in non-template classes,
causing them to be treated as distinct from their forward declarations.
This led to ambiguity errors at call sites.
Per [temp.friend]/9, a constrained friend is only "member-like" (and thus
declares a different function) in two cases:
1. Non-template friends with constraints (must be in a templated class)
2. Template friends whose constraints depend on outer template parameters
In both cases, the enclosing class scope must be templated. The fix adds
a check for CLASSTYPE_IMPLICIT_INSTANTIATION to ensure the friend's
context is actually a class template, not a plain class or explicit
specialization.
PR c++/122550
gcc/cp/ChangeLog:
* decl.cc (member_like_constrained_friend_p): Check that the
friend's enclosing class is an implicit instantiation.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-friend18.C: New test.
* g++.dg/cpp2a/concepts-friend18a.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Egas Ribeiro [Mon, 22 Dec 2025 22:30:12 +0000 (22:30 +0000)]
c++: Fix ICE on partial specialization redeclaration with mismatched parameters [PR122958]
When a partial specialization was redeclared with different template
parameters, maybe_new_partial_specialization was incorrectly treating it
as the same specialization by only comparing template argument lists
without comparing template-heads. This caused an ICE when the
redeclaration had different template parameters.
Per [temp.spec.partial.general]/2, two partial specializations declare
the same entity only if they have equivalent template-heads and
template argument lists.
Fix by comparing template parameter lists (template-heads) in addition
to template argument lists when checking for existing specializations,
and removing flag_concepts to provide diagnostics before c++20 for the
testcase.
PR c++/122958
gcc/cp/ChangeLog:
* pt.cc (maybe_new_partial_specialization): Compare template
parameter lists when checking for existing specializations and
remove flag_concepts check.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/partial-spec-redecl.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Fix cfg_attr expansion and feature gate attribute handling
Fixes Rust-GCC#4245
gcc/rust/ChangeLog:
* checks/errors/feature/rust-feature-gate.cc (FeatureGate::visit): Added
handling for META_ITEM type attributes to properly process feature gates.
* expand/rust-cfg-strip.cc (expand_cfg_attrs): Fixed a bug where
newly inserted cfg_attr attributes wheren't being reprocessed,
and cleaned up the loop increment logic.
Lucas Ly Ba [Fri, 14 Nov 2025 21:07:00 +0000 (21:07 +0000)]
gccrs: refactor unused var lint
gcc/rust/ChangeLog:
* checks/lints/unused-var/rust-unused-var-checker.cc (UnusedVarChecker::visit):
Change unused name warning to unused variable warning.
* checks/lints/unused-var/rust-unused-var-collector.cc (UnusedVarCollector::visit):
Remove useless methods.
* checks/lints/unused-var/rust-unused-var-collector.h: Same here.
* checks/lints/unused-var/rust-unused-var-context.cc (UnusedVarContext::add_variable):
Add used variables to set.
(UnusedVarContext::mark_used): Remove method.
(UnusedVarContext::is_variable_used):
Check if the set contains the hir id linked to a variable.
(UnusedVarContext::as_string): Refactor method for new set.
* checks/lints/unused-var/rust-unused-var-context.h: Refactor methods.
* lang.opt: Change description for unused check flag.
Ryutaro Okada [Sun, 10 Aug 2025 02:24:56 +0000 (19:24 -0700)]
gccrs: implement unused variable checker on HIR.
This change moves the unused variable checker from the type resolver
to HIR. We can now use the HIR Default Visitor, and it will be much more
easier to implement other unused lints with this change.
Harishankar [Mon, 24 Nov 2025 20:41:33 +0000 (02:11 +0530)]
gccrs: Fix ICE with continue/break/return in while condition
Fixes Rust-GCC/gccrs#3977
The predicate expression must be evaluated before type checking
to ensure side effects occur even when the predicate has never type.
This prevents skipping function calls, panics, or other side effects
in diverging predicates.
* backend/rust-compile-expr.cc (CompileExpr::visit): Always
evaluate predicate expression before checking for never type
to preserve side effects in while loop conditions.
* typecheck/rust-hir-type-check-expr.cc: Update handling of break/continue.
Egas Ribeiro [Fri, 19 Dec 2025 16:58:58 +0000 (16:58 +0000)]
c++: Fix ICE with lambdas combining explicit and implicit template params [PR117518]
When a lambda with explicit template parameters like []<int> also has
implicit template parameters from auto, and is used as a default
template argument, processing_template_parmlist remained set
from the outer template context. This caused
function_being_declared_is_template_p to incorrectly return false,
leading synthesize_implicit_template_parm to create a new template
scope instead of extending the existing one, resulting in a binding
level mismatch and an ICE in poplevel_class.
Fix by clearing processing_template_parmlist in
cp_parser_lambda_expression alongside the other parser state
save/restore operations.
PR c++/117518
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lambda_expression): Clear
processing_template_parmlist when parsing lambda body.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-targ19.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Xi Ruoyao [Thu, 18 Dec 2025 03:39:38 +0000 (11:39 +0800)]
LoongArch: relax the check for --with-tune
Someone (via a WeChat group) reported that --with-arch=la464
--with-tune=la664 had stopped to work after commiting the LA32 support.
While this can be treated as a simple logic error (i.e. we may simply
change "loongarch64" in the case statement to an asterisk), IMO we
should just relax the check: at runtime the "unreasonable" combinations
like "-march=la64v1.0 -mtune=loongarch32" or "-march=la664 -mtune=la464"
is allowed (and the second case has been allowed for a long time), and a
combination of --with-arch=A --with-tune=T should be allowed if -march=A
-mtune=T is allowed at runtime.
Also if we consider the fact that --with-tune= and -mtune= only select a
set of heruistic parameters, such combinations may be not so
unreasonable.
gcc/
* config.gcc: Relax the check for LoongArch with_tune.
Andrew Pinski [Tue, 23 Dec 2025 01:58:35 +0000 (17:58 -0800)]
ifcvt: Fix noce_try_cond_zero_arith after get_base_reg change [PR123267]
A few fixes are needed after the change to get_base_reg of r16-6333-gac64ceb33bf05b. First we need to use the correct target mode
of the operand, this means if we are doing a subreg of QI mode, using
QImode for the conditional move.
Second we also need to use the original operands instead of the ones
removing the subreg still.
Pushed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR rtl-optimization/123267
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Pass the original operands
of a instead of the stripped off values. The mode of the operand
which is being used.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr123267-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
AutoFDO: Implement summary information in auto-profile
This patch aims to implement summary support in auto-profile, similar to
LLVM. The summary information stores various information about the
profile being read such as the number of functions, the maximum sample
count, the total number of samples and so on.
It also adds a section called the "detailed summary" which contains a
histogram-based calculation of the minimum execution count for a sample
needed to belong to a specific percentile of samples. This is used to
decide the hot count threshold (which can be controlled with a command
line parameter). The default is any sample belonging to the 99th percentile
being marked as hot.
This patch requires the changes from https://github.com/google/autofdo/pull/251
to work correctly.
* auto-profile.cc (string_table::~string_table): Update to free
original_names_map_.
(string_table::original_names_map_): New member.
(string_table::clashing_names_map_): Likewise.
(string_table::get_original_name): New function.
(string_table::read): Figure out clashes while reading.
(autofdo_source_profile::offline_external_functions): Call
get_original_name.
Nathaniel Shead [Thu, 4 Dec 2025 13:03:46 +0000 (00:03 +1100)]
c++/modules: Ignore exposures in lambdas in initializers [PR122994]
As the PR rightly points out, a lambda is not really a declaration in
and of itself by the standard, and so a lambda only used in a context
where exposures are ignored should not itself cause an error.
This patch implements this by way of a new flag set on deps that are
first found in an ignored context. This flag gets cleared if we ever
see the dep in a context where exposures are not ignored. Then, while
walking a declaration with this flag set, we re-establish an ignored
context. This is done for all decls (not just lambdas) to handle
block-scope classes as well.
Additionally, we prevent walking of attached declarations for a
DECL_MODULE_KEYED_DECLS_P entity during dependency gathering, so that we
don't think we've seen the decl at this point. This means we may not
have an appropriate entity to stream for this walk; to prevent any
potential issues with merging we stream a NULL_TREE 'hole' in the vector
and handle this carefully on import.
This requires a small amount of testsuite adjustment because we no
longer diagnose errors we used to. Because our ABI for inline variables
with dynamic initialization is to just do the initialization in the
module's initializer function (and importers only perform the static
initialization) we don't bother to walk the definition of inline
variables containing lambdas and so don't see the exposures, despite
us considering TU-local entities in static initializers of inline
variables being exposures (see PR c++/119551). This is legal by the
current wording of the standard, which does not consider the definition
of any variable to be an exposure (even an inline one).
PR c++/122994
gcc/cp/ChangeLog:
* module.cc (depset::disc_bits): New flag
DB_IGNORED_EXPOSURE_BIT.
(depset::is_ignored_exposure_context): New getter.
(depset::hash::ignore_tu_local): Rename to...
(depset::hash::ignore_exposure): ...this, and make private.
(depset::hash::hash): Rename ignore_tu_local.
(depset::hash::ignore_exposure_if): New function.
(trees_out::decl_value): Don't build deps for keyed entities.
(trees_in::decl_value): Handle missing keys.
(trees_out::write_function_def): Use ignore_exposure_if.
(trees_out::write_var_def): Likewise.
(trees_out::write_class_def): Likewise.
(depset::hash::make_dependency): Set DB_IGNORED_EXPOSURE_BIT if
appropriate, or clear it otherwise.
(depset::hash::add_dependency): Rename ignore_tu_local.
(depset::hash::find_dependencies): Set ignore_exposure if in
such a context.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-17_b.C: Use functions and internal
types rather than lambdas.
* g++.dg/modules/internal-4_b.C: Correct expected result.
* g++.dg/modules/internal-20_a.C: New test.
* g++.dg/modules/internal-20_b.C: New test.
* g++.dg/modules/internal-20_c.C: New test.
* g++.dg/modules/internal-21_a.C: New test.
* g++.dg/modules/internal-21_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Steve Kargl [Mon, 22 Dec 2025 02:32:46 +0000 (18:32 -0800)]
fortran [PR122957] DTIO incompatibility with -fdefault-interger-8
The -fdefault-integer-8 option is optional to assist with legacy
fortran codes. It is not a Standard requirement and is not
compatible with the newer user defined derived type I/O.
PR fortran/122957
gcc/fortran/ChangeLog:
* interface.cc (gfc_match_generic_spec): Issue an error
so that users do not use -fdefault-integer-8 with DTIO.
Harald Anlauf [Mon, 22 Dec 2025 20:05:29 +0000 (21:05 +0100)]
Fortran: fix variable definition context checks for SELECT TYPE [PR123253]
Commit r16-6300 introduced a regression when checking the variable
definition context of SELECT TYPE variables where the selector was not a
dummy argument as the scan for the association target was too shallow.
Scan through association lists for the ultimate selector.
PR fortran/123253
gcc/fortran/ChangeLog:
* expr.cc (gfc_check_vardef_context): Replace simple check by a
scan through the association targets for a dummy argument.
gcc/testsuite/ChangeLog:
* gfortran.dg/associate_76.f90: Extended testcase.
* gfortran.dg/associate_77.f90: New test.
Tomasz Kamiński [Mon, 22 Dec 2025 10:53:45 +0000 (11:53 +0100)]
libstdc++/doc: Document generate_canonical and variant compat macros.
The _GLIBCXX_USE_OLD_GENERATE_CANONICAL was introduced by r16-6177-g866bc8a9214b1d that implemented P0952R2 [1] resolution
for LWG2524 as DR against C++20.
The _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI was introduced by r16-6301-gb3c167b61fd75f that resovled PR112591.
Eric Botcazou [Mon, 22 Dec 2025 17:50:59 +0000 (18:50 +0100)]
Ada: Fix bogus component visibility error for class-wide type in generic
The problem is that Analyze_Overloaded_Selected_Component does:
-- If the prefix is a class-wide type, the visible components
-- are those of the base type.
if Is_Class_Wide_Type (T) then
T := Etype (T);
end if;
and Resolve_Selected_Component does:
-- The visible components of a class-wide type are those of
-- the root type.
if Is_Class_Wide_Type (T) then
T := Etype (T);
end if;
while Analyze_Selected_Component does:
-- For class-wide types, use the entity list of the root type
if Is_Class_Wide_Type (Prefix_Type) then
Type_To_Use := Root_Type (Prefix_Type);
end if;
when faced with a selected component. So the 3rd goes to the root type, the
1st to the base type, and the 2nd wants to do like the 3rd but ends up doing
like the 1st! This does not change anything for the class-wide type itself,
but does for its class-wide subtypes. The correct processing is the 3rd.
gcc/ada/
PR ada/123185
* sem_ch4.adb (Analyze_Overloaded_Selected_Component): Go to the
root when the prefix has a class-wide type.
* sem_res.adb (Resolve_Selected_Component): Likewise.
gcc/testsuite/
* gnat.dg/specs/class_wide1.ads: New test.
Jeff Law [Mon, 22 Dec 2025 17:54:05 +0000 (10:54 -0700)]
[RISC-V][V2] Improve spill code for RVV slightly to fix regressions after recent changes
Surya's recent patch for hard register propagation has caused regressions on
the RISC-V port for the various spill-* testcases. After reviewing the newer
generated code it was clear the new code was worse.
The core problem is we have a copy insn that is not frame related (and should
not be frame related) and a use of the destination of the copy in an insn that
is frame related. Prior to Surya's change we could propagate away the copy,
but not anymore.
Ideally we'd just avoid generating the copy entirely, but the structure of the
code to legitimize a poly_int isn't well suited for that. So instead we have
the code signal that it created a trivial copy and we try to optimize the code
after creation, but well before regcprop would have run. That fixes the code
quality aspect of the regression. In fact, it looks like the code can at times
be slightly better, but I didn't track down the precise reason why we were able
to re-use the read of VLEN so much better then before.
The optimization step is pretty simple. When it's been signaled that a copy was
generated, look back one insn and change it from writing the scratch register
to write the final destination instead.
That triggers the need to generalize the testcases so that they don't use
specific registers. We can also see the csr reads of the VLEN register getting
CSE'd more often in those testcases, so they're adjusted for that change as
well. There's some hope this will improve spill code more generally -- I
haven't really evaluated that, but I do know that when we spill vector
registers, the resulting code seems to have a lot of redundant VLEN reads.
Anyway, bootstrapped and regression tested on riscv (BPI and Pioneer). It's
also been through rv32 and rv64 regression testing. It doesn't fix all the
regressions for RISC-V on the trunk because (of course) something new got
introduced this week ;(
[ This is the spill-7 part of my last commit. After reviewing the logs from
the pre-commit system, it's good. ]