Vineet Gupta [Mon, 22 Dec 2025 16:54:10 +0000 (08:54 -0800)]
ifcvt: cond zero arith: handle subreg for shift count
Some backends, RISC-V included, wrap shift counts in subreg which
current cond zero arith wasn't handling.
This came up up when looking at the original submission of cond zero
arith which did handle subregs but then was omitted to for initial
simplicity and then got lost along the way.
Vineet Gupta [Mon, 22 Dec 2025 16:54:06 +0000 (08:54 -0800)]
ifcvt: cond zero arith: elide short forward branch for signed GE 0 comparison [PR122769]
Before After
---------------------+----------------------
bge a0,zero,.L2 | slti a0,a0,0
| czero.eqz a0,a0,a0
xor a1,a1,a3 | xor a0,a0,a0
.L2 |
mv a0,a1 |
ret | ret
This is what all the prev NFC patches have been preparing to get to.
Currently the cond arith code only handles EQ/NE zero conditions missing
ifcvt optimization for cases such as GE zero, as show in example above.
This is due to the limitation of noce_emit_czero () so switch to
noce_emit_cmove () which can handle conditions other than EQ/NE and
if needed generate additional supporting insns such as SLT.
This also allows us to remove the constraint at the entry to limit to EQ/NE
conditions, improving ifcvt outcomes in general.
PR target/122769
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Use noce_emit_cmove.
Delete noce_emit_czero () no longer used.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr122769.c: New test.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Vineet Gupta [Mon, 22 Dec 2025 16:52:07 +0000 (08:52 -0800)]
ifcvt: cond zero arith: opencode helper noce_bbs_ok_for_cond_zero_arith [NFC]
This makes the code more readable by eliminating a bunch of pointer
intermediaries which obfuscate if_info items needed later in
noce_try_cond_zero_arith (). And while here add some top level comments
about what cond zero arith actually does.
gcc/ChangeLog:
* ifcvt.cc (noce_bbs_ok_for_cond_zero_arith): Move logic out.
(noce_try_cond_zero_arith): Into here.
Jeff Law [Mon, 22 Dec 2025 16:47:26 +0000 (09:47 -0700)]
[RISC-V][V2] Improve spill code for RVV slightly to fix regressions after recent changes
Surya's recent patch for hard register propagation has caused regressions on
the RISC-V port for the various spill-* testcases. After reviewing the newer
generated code it was clear the new code was worse.
The core problem is we have a copy insn that is not frame related (and should
not be frame related) and a use of the destination of the copy in an insn that
is frame related. Prior to Surya's change we could propagate away the copy,
but not anymore.
Ideally we'd just avoid generating the copy entirely, but the structure of the
code to legitimize a poly_int isn't well suited for that. So instead we have
the code signal that it created a trivial copy and we try to optimize the code
after creation, but well before regcprop would have run. That fixes the code
quality aspect of the regression. In fact, it looks like the code can at times
be slightly better, but I didn't track down the precise reason why we were able
to re-use the read of VLEN so much better then before.
The optimization step is pretty simple. When it's been signaled that a copy was
generated, look back one insn and change it from writing the scratch register
to write the final destination instead.
That triggers the need to generalize the testcases so that they don't use
specific registers. We can also see the csr reads of the VLEN register getting
CSE'd more often in those testcases, so they're adjusted for that change as
well. There's some hope this will improve spill code more generally -- I
haven't really evaluated that, but I do know that when we spill vector
registers, the resulting code seems to have a lot of redundant VLEN reads.
Anyway, bootstrapped and regression tested on riscv (BPI and Pioneer). It's
also been through rv32 and rv64 regression testing. It doesn't fix all the
regressions for RISC-V on the trunk because (of course) something new got
introduced this week ;(
I didn't include the spill-7.c change from either version of the patch. It
didn't fix the regression in pre-commit CI, so I'll chase that down
independently.
gcc/
* config/riscv/riscv.cc (riscv_expand_mult_with_const_int): Signal
when this creates a simple copy that may be optimized.
(riscv_legitimate_poly_move): Try to optimize away any copy created
by riscv_expand_mult_with_const_int.
* a68-parser-scanner.cc (a68_file_size): Fix comment to mention
it accepts `FILE *' and not file descriptor.
Fix invocation of `lseek' to correctly revert position of file
offset to previous one.
Harald Anlauf [Sun, 21 Dec 2025 22:03:28 +0000 (23:03 +0100)]
fortran: fix testsuite regression for gfortran.dg/value_9.f90 [PR123201]
Commit r16-3499 introduced a regression on targets where truncation of a
string argument passed to a CHARACTER(len=1),VALUE dummy argument missed
the special treatment needed for passing single characters.
PR fortran/123201
gcc/fortran/ChangeLog:
* trans-expr.cc (conv_dummy_value): Convert string of length 1 to a
single character for passing as actual argument.
Jerry DeLisle [Sun, 21 Dec 2025 21:33:15 +0000 (13:33 -0800)]
fortran: [PR121472] Fix ICE with constructor for finalized zero-size type.
When a derived type has a final subroutine and a constructor interface,
but is effectively zero-sized, the gimplifier fails on the finalization
code. The existing check for empty types (!derived->components) only
catches completely empty types, not types with empty components.
Replace with a tree-level TYPE_SIZE_UNIT check that catches all
zero-size cases.
PR fortran/121472
gcc/fortran/ChangeLog:
* trans.cc (gfc_finalize_tree_expr): Replace !derived->components
check with TYPE_SIZE_UNIT check for zero-size types.
Tamar Christina [Sun, 21 Dec 2025 08:27:13 +0000 (08:27 +0000)]
vect: use wider precision type for generating early break scalar IV [PR123089]
In the PR we see that the new scalar IV tricks other passes to think there's an
overflow to the use of a signed counter:
The loop is known to iterate 8191 times and we have a VF of 8 and it starts
at 2.
The codegen out of the vectorizer is the same as before, except we now have a
scalar variable counting the scalar iteration count vs a vector one.
i.e. we have
_45 = _39 + 8;
vs
_46 = _45 + { 16, 16, 16, 16, ... }
we pick a lower VF now since costing allows it to but that's not important.
When we get to cunroll since the value is now scalar, it sees that 8 * 8191
would overflow a signed short and so it changes the loop bounds to the largest
possible signed value and then uses this to elide the ivtmp_50 < 8191 as always
true and so you get an infinite loop:
Analyzing # of iterations of loop 1
exit condition [1, + , 1](no_overflow) < 8191
bounds on difference of bases: 8190 ... 8190
result:
# of iterations 8190, bounded by 8190
Statement (exit)if (ivtmp_50 < 8191)
is executed at most 8190 (bounded by 8190) + 1 times in loop 1.
Induction variable (signed short) 8 + 8 * iteration does not wrap in statement
_45 = _39 + 8;
in loop 1.
Statement _45 = _39 + 8;
is executed at most 4094 (bounded by 4094) + 1 times in loop 1.
The signed type was originally chosen because of the negative offset we use when
adjusting for peeling for alignments with masks. However this then introduces
issues as we see here with signed overflow. This patch instead determines the
smallest possible unsigned type for use by the scalar IV where the overflow
won't happen when we include the extra bit for the sign. i.e. if the scalar IV
is an unsigned 8 bit value we pick a signed 16-bit type. But if a signed 8-bit
value we pick a unsigned 8 bit type.
We use the initial niters value to determine the smallest size possible, to
prevent certain cases like when the IV in code is a 64-bit to need a TImode
counter. I also only require the additional bit when I know we'll be generating
the SMAX. I've now moved this to vectorizable_early_exit such that if we do
end up needing something like TImode that we don't vectorize if the target
doesn't support it.
I've also added some testcases for masking around the boundary values. I've
only added them for char to reduce the runtime of the tests.
gcc/ChangeLog:
PR tree-optimization/123089
* tree-vect-loop.cc (vect_update_ivs_after_vectorizer_for_early_breaks):
Add conversion if required, Note that if we did truncate the original
scalar loop had an overflow here anyway.
(vect_get_max_nscalars_per_iter): Expose.
* tree-vect-stmts.cc (vect_compute_type_for_early_break_scalar_iv): New.
(vectorizable_early_exit): Find smallest type where we won't have UB in
the signed IV and store it.
* tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_IV_TYPE): New.
(class _loop_vec_info): Add early_break_iv_type.
(vect_min_prec_for_max_niters): New.
* tree-vect-loop-manip.cc (vect_do_peeling): Use it.
gcc/testsuite/ChangeLog:
PR tree-optimization/123089
* gcc.dg/vect/vect-early-break_141-pr123089.c: New test.
* gcc.target/aarch64/sve/peel_ind_14.c: New test.
* gcc.target/aarch64/sve/peel_ind_14_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_15.c: New test.
* gcc.target/aarch64/sve/peel_ind_15_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_16.c: New test.
* gcc.target/aarch64/sve/peel_ind_16_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_17.c: New test.
* gcc.target/aarch64/sve/peel_ind_17_run.c: New test.
Andrew Pinski [Sat, 20 Dec 2025 20:00:36 +0000 (12:00 -0800)]
extension: Fix documentation for __builtin_*_overflow_p [PR123222]
This fixes the copy-and-pasto for these builtins.
Basically the documentation currently says "addition" as that was copied from
__builtin_add_overflow documentation but really it should say corresponding operation
instead.
Pushed as obvious.
PR middle-end/123222
gcc/ChangeLog:
* doc/extend.texi: Fix copy-and-pasto for __builtin_*_overflow_p.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jose E. Marchesi [Sat, 20 Dec 2025 14:59:50 +0000 (15:59 +0100)]
a68: fix layout of incomplete types
Apparently there is some case where the c_union of an union may be
incomplete and the containing union complete. At this point I don't
fully understand how is that possible and the layering out of modes
should probably be rethinked, but for now fix this corner case.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
* a68-low-moids.cc (a68_lower_moids): Fix for layout of
incomplete types.
Nathaniel Shead [Fri, 14 Nov 2025 23:34:36 +0000 (10:34 +1100)]
c++: Implement dependent ADL for use with modules
[module.global.frag] p3.3 says "A declaration D is decl-reachable from a
declaration S in the same translation unit if ... S contains a dependent
call E ([temp.dep]) and D is found by any name lookup performed for an
expression synthesized from E by replacing each type-dependent argument
or operand with a value of a placeholder type with no associated
namespaces or entities".
This requires doing partial ADL ondependent calls, in case there are
non-dependent arguments that would cause new functions to become
decl-reachable. This patch implements this with an additional lookup
during modules streaming to find any such entities.
This causes us to do ADL in more circumstances; this means also that we
might instantiate templates in cases we didn't use to. This could cause
issues given we have already started our modules walk at this point, or
break any otherwise valid existing code. To fix this patch adds a flag
to do a "tentative" ADL pass which doesn't attempt to complete any types
(and hence cause instantiations to occur); this means that we might miss
some associated entities however. During a tentative walk we can also
skip entities that we know won't contribute to the missing
decl-reachable set, as an optimisation.
One implementation limitation is that both modules tree walking and
name lookup marks tree nodes as TREE_VISITED for different purposes; to
avoid conflicts this patch caches calls that will require lookup in a
separate worklist to be processed after the walk is done.
PR c++/122712
gcc/cp/ChangeLog:
* module.cc (depset::hash::dep_adl_info): New type.
(depset::hash::dep_adl_entity_list): New work list.
(depset::hash::hash): Create it.
(depset::hash::~hash): Release it.
(trees_out::tree_value): Cache possibly dependent
calls during tree walk.
(depset::hash::add_dependent_adl_entities): New function.
(depset::hash::find_dependencies): Process cached entities.
* name-lookup.cc (name_lookup::tentative): New member.
(name_lookup::name_lookup): Initialize it.
(name_lookup::preserve_state): Propagate tentative from previous
lookup.
(name_lookup::adl_namespace_fns): Don't search imported bindings
during tentative lookup.
(name_lookup::adl_class): Don't attempt to complete class types
during tentative lookup.
(name_lookup::search_adl): Skip type-dependent args and avoid
unnecessary work during tentative lookup.
(lookup_arg_dependent): Add tentative parameter.
* name-lookup.h (lookup_arg_dependent): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/adl-12_a.C: New test.
* g++.dg/modules/adl-12_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Jakub Jelinek [Sat, 20 Dec 2025 11:04:36 +0000 (12:04 +0100)]
c++: Ignore access in is_implicit_lifetime trait decisions [PR122690]
I've implemented the non-aggregate part of is_implicit_lifetime
paper according to the paper's comment how it can be implemented, i.e.
the std::conjunction from
template<typename T>
struct is_implicit_lifetime : std::disjunction<
std::is_scalar<T>,
std::is_array<T>,
std::is_aggregate<T>,
std::conjunction<
std::is_trivially_destructible<T>,
std::disjunction<
std::is_trivially_default_constructible<T>,
std::is_trivially_copy_constructible<T>,
std::is_trivially_move_constructible<T>>>> {};
in the paper. But as reported in PR122690, the actual wording in the
paper is different from that, the
https://eel.is/c++draft/class.prop#16.2 part of it:
"it has at least one trivial eligible constructor and a trivial,
non-deleted destructor" doesn't talk anything about accessibility
of those ctors or dtors, only triviality, not being deleted and eligibility.
My understanding is that GCC handles the last 2 bullets of
https://eel.is/c++draft/special#6 by not adding ctors ineligible because
of those into the overload at all, and for testing deleted cdtors
I need to lazily declare them in case such synthetization makes them
deleted.
So, this patch first checks for the easy cases (where the flags on the
type say the dtor is non-trivial or all the 3 special member ctors are
non-trivial) and if not, lazily declares them if needed and checks if they
are trivial and non-deleted.
2025-12-20 Jakub Jelinek <jakub@redhat.com>
PR c++/122690
* tree.cc (implicit_lifetime_type_p): Don't test is_trivially_xible,
instead try to lazily declare dtor and default, copy and move ctors
if needed and check for their triviality and whether they are
deleted.
Jakub Jelinek [Sat, 20 Dec 2025 10:59:19 +0000 (11:59 +0100)]
i386: Fix up handling of some -mno-avx512* options [PR123216]
This PR is about -mavx10.2 -mno-avx512vl ICE on some builtin.
Though, because -mavx10.2 implies -mavx512vl (among many others), the
pattern is right and doesn't need to care about such weird cases.
What is wrong is the handling of -mno-avx512vl and various other options,
that should unset -mavx10.1 and that should unset -mavx10.2, but it doesn't.
I went through various ISAs which 10.1 or 10.2 implies, looking for the
ISA{,2}_*_SET and corresponding ISA{,2}_*_UNSET macros and their use or lack
thereof.
Here is what I found.
OPTION_MASK_ISA_AVX512FP16_UNSET has been incorrectly defined (avx512fp16
implies avx512bw, not the other way around), but fortunately wasn't used.
And then various ISAs implied by -mavx10.1 (except for -mavx512f which was
correct) missed clearing -mavx10.{1,2} on -mno-* handling.
As mentioned in the PR, it would be really nice to add some verification of
the set and unset macros to verify consistency.
2025-12-20 Jakub Jelinek <jakub@redhat.com>
PR target/123216
* common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVX512FP16_UNSET):
Remove unused macro.
(OPTION_MASK_ISA2_AVX512FP16_UNSET, OPTION_MASK_ISA2_AVX512BF16_UNSET,
OPTION_MASK_ISA2_AVX512BW_UNSET): Or in OPTION_MASK_ISA2_AVX10_1_UNSET.
(OPTION_MASK_ISA2_AVX512CD_UNSET, OPTION_MASK_ISA2_AVX512DQ_UNSET,
OPTION_MASK_ISA2_AVX512VL_UNSET, OPTION_MASK_ISA2_AVX512IFMA_UNSET,
OPTION_MASK_ISA2_AVX512VNNI_UNSET,
OPTION_MASK_ISA2_AVX512VPOPCNTDQ_UNSET,
OPTION_MASK_ISA2_AVX512VBMI_UNSET, OPTION_MASK_ISA2_AVX512VBMI2_UNSET,
OPTION_MASK_ISA2_AVX512BITALG_UNSET): Define.
(ix86_handle_option): For
-mno-avx512{cd,dq,vl,ifma,vnni,vpopcntdq,vbmi,vbmi2,bitalg} also remove
corresponding OPTION_MASK_ISA2_AVX512*_UNSET from ix86_isa_flags2
and add it to ix86_isa_flags2_explicit.
Jakub Jelinek [Sat, 20 Dec 2025 10:58:25 +0000 (11:58 +0100)]
i386: Fix up expansion of 2 keylocker and one user_msr builtin [PR123217]
target can be especially at -O0 a MEM, not just a REG, and most of the
ix86_expand_builtin spots which use target and can't support MEM
destinations deal with it properly, except these 3 spots don't.
Fixed thusly, when we change target to a new pseudo, the caller will
take care of storing that pseudo into the MEM, and this is the
solution other spots with similar requirements use in the function.
2025-12-20 Jakub Jelinek <jakub@redhat.com>
PR target/123217
* config/i386/i386-expand.cc (ix86_expand_builtin)
<case IX86_BUILTIN_ENCODEKEY128U32, case IX86_BUILTIN_ENCODEKEY256U32,
case IX86_BUILTIN_URDMSR>: Set target to a new pseudo even if it is
non-NULL but doesn't satisfy register_operand predicate.
* gcc.target/i386/keylocker-pr123217.c: New test.
* gcc.target/i386/user_msr-pr123217.c: New test.
The `recompute_dominator' function used in the code fragment within
this patch assumes correctness in the rest of the CFG. Consequently,
it is wrong to rely upon it before the subsequent updates are made in
the "Update dominators for multiple exits" loop in the function.
Furthermore, if `loop_exit' == `scalar_exit', the "Update dominators for
multiple exits" logic will already take care of updating the
dominator for `scalar_exit->dest', such that the moved statement is
unnecessary.
gcc/ChangeLog:
PR tree-optimization/123152
* tree-vect-loop-manip.cc
(slpeel_tree_duplicate_loop_to_edge_cfg): Correct order of
dominator update.
Jakub Jelinek [Fri, 19 Dec 2025 22:10:36 +0000 (23:10 +0100)]
fortran, openmp: Add default: clause in order to avoid -Wmaybe-uninitialized warning
While the enum has only 4 enumerators and all of them are listed, in theory values
with the enum type could contain other values and so without default:
-Wmaybe-uninitialized warning for the s variable can happen.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
* dump-parse-tree.cc (show_omp_clauses): Add default: with
gcc_unreachable () to avoid spurious -Wmaybe-uninitialized warnings.
Jakub Jelinek [Fri, 19 Dec 2025 22:09:06 +0000 (23:09 +0100)]
Some further comment typos
This patch attempts to fix various comment typos (inspired by Gemini AI
on dwarf2out.cc, gimplify.cc and combine.cc files producing list of
typos, then manual grep for all the occurrences and changing them case
by case (e.g. there was one correct recourse use elsewhere I believe).
Tomasz Kamiński [Thu, 11 Dec 2025 09:43:44 +0000 (10:43 +0100)]
libstdc++: Use union to store non-trivially destructible types in C++17 mode [PR112591]
This patch disables use of specialization _Uninitialized<_Type, false> for
non-trivially destructible types by default in C++17, and fallbacks to
the primary template, that stores the type in union directly. This makes the
ABI consistent between C++17 and C++20 (or later). This partial specialization
is no longer required after the changes introduced in r16-5961-g09bece00d0ec98.
This fixes non-conformance in C++17 mode where global variables of a variant
specialization type, were not statically-initialized for non-trivially
destructible types, even if initialization of the selected alternative could
be performed at compile time. For illustration, the following global variable
will be statically initialized after this change:
std::variant<std::unique_ptr<T>, std::unique_ptr<U>> ptr;
This constitutes an ABI break, and changes the layout of the types, that uses
the same non-trivially copyable both as the base class, as alternative of the
variant object that is first member:
struct EmptyNonTrivial { ~EmptyNonTrivial(); };
struct Affected : EmptyNonTrivial {
std::variant<EmptyNonTrivial, char> mem; // mem was at offset zero,
// will use non-zero offset now
};
After changes the layout of such types consistent with one used for empty types
with trivial destructor, or one used for any empty type in C++20 or later.
For programs affected by this change, it can be reverted in C++17 mode, by
defining _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI. However, presence of this macro
has no effect in C++20 or later modes.
PR libstdc++/112591
libstdc++-v3/ChangeLog:
* include/std/variant (_Uninitialized::_M_get, __get_n)
(_Uninitialized<_Type, false>): Add _GLIBCXX_USE_VARIANT_CXX17_OLD_ABI
check to preprocessor guard.
* testsuite/20_util/variant/112591.cc: Updated tests.
* testsuite/20_util/variant/112591_compat.cc: New test.
* testsuite/20_util/variant/constinit.cc: New test.
* testsuite/20_util/variant/constinit_compat.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Harald Anlauf [Fri, 19 Dec 2025 20:15:44 +0000 (21:15 +0100)]
Fortran: INTENT(IN) polymorphic argument with pointer components [PR71565]
PR fortran/71565
gcc/fortran/ChangeLog:
* expr.cc (gfc_check_vardef_context): Fix treatment of INTENT(IN)
checks for ASSOCIATE variables. Correct checking of PROTECTED
objects, as subobjects inherit the PROTECTED attribute.
gcc/testsuite/ChangeLog:
* gfortran.dg/protected_8.f90: Adjust patterns.
* gfortran.dg/associate_76.f90: New test.
Robin Dapp [Thu, 18 Dec 2025 10:19:57 +0000 (11:19 +0100)]
RISC-V: Fix overflow check in interleave pattern [PR122970].
In the pattern where we interpret and code-gen two interleaving series as if
they were represented in a larger type we check for overflow.
The overflow check is basically
if (base + (nelems - 1) * step >> inner_bits != 0)
overflow = true;
In the PR, base is negative and we interpret it as negative uint64
value. Thus, e.g. base + (nelems - 1) * step = -32 + 7 * 8 = 24.
24 fits uint8 and we wrongly assume that no overflow happens.
This patch reinterprets base as type of inner bit size which makes the
overflow check work.
PR target/122970
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector_interleaved_stepped_npatterns):
Reinterpret base as smaller type.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add rvv_zvl128b_ok.
* gcc.target/riscv/rvv/autovec/pr122970.c: New test.
Robin Dapp [Thu, 13 Nov 2025 08:23:40 +0000 (09:23 +0100)]
RISC-V: Generic vec_extract via subreg.
We are missing several vec_extract chances because the current autovec
patterns are not comprehensive. In particular we don't extract from
pseudo-VLA modes that are actually VLS modes (just VLA modes in name).
Rather than add even more mode combinations to vec_extract, this patch
uses a dynamic approach in legitimize_move. At that point we can just check
if the mode sizes make sense and then emit the same code as before.
This is not the ideal solution as the middle-end and the vectorizer in
particular queries the vec_extract optab for support and won't emit
certain code sequences if it's not present (e.g. in VMAT_STRIDED_SLP
or when trying intermediate-sized vectors in a chain).
For simple BIT_FIELD_REFs it works, though.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vector_subreg_extract): New
function that checks for and performs "vector extracts".
(legitimize_move): Call new function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/subreg-extract.c: New test.
Robin Dapp [Thu, 6 Nov 2025 12:16:40 +0000 (13:16 +0100)]
RISC-V: Add VLS modes to autovec iterators.
In order to allow more VLS vectorization, add more VLS modes to the
autovec expanders, as well as some missing VLS modes that I encountered while
testing.
Robin Dapp [Thu, 6 Nov 2025 16:43:58 +0000 (17:43 +0100)]
RISC-V: Change gather/scatter iterators.
This patch changes the gather/scatter mode iterators from a ratio
scheme to a more direct one where the index mode size is
1/2, 1/4, 1/8, 2, 4, 8 times the data mode size. It also adds VLS modes
to the iterators and removes the now unnecessary
gather_scatter_valid_offset_p.
Robin Dapp [Mon, 15 Dec 2025 10:20:54 +0000 (11:20 +0100)]
vect: Fix scale-only pass in vect_gather_scatter_fn_p [PR123118].
In the process of refactoring the gather/scatter rework this likely got
lost. In the "third pass" we look for a configuration with a smaller
scale and a larger offset type with the same signedness. We want to be
able to multiply the offset by the new scale but not change the offset
sign. What we actually checked is whether a converted offset type was
supported without setting *supported_offset_vectype.
This patch removes the check for the offset type change and replaces it
with a TYPE_SIGN match.
PR tree-optimization/123118
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Check that
the type sign is equal.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/autovec/pr123118.C: New test.
Robin Dapp [Mon, 15 Dec 2025 12:01:40 +0000 (13:01 +0100)]
forwprop: Check type conversion in pack/unpack [PR123117].
When using pack or unpack in the simplification of a vector constructor
we must make sure that the original BIT_FIELD_REF was no sign-changing
nop conversion. If it was we cannot safely pack/unpack as that would
skip sign or zero extensions. This patch adds useless_type_conversion_p
to both paths.
PR tree-optimization/123117
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_vector_constructor):
Check if we had a nop conversion and don't use pack/unpack in
that case.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lsx/pr123117.c: New test.
Robin Dapp [Mon, 8 Dec 2025 11:18:55 +0000 (12:18 +0100)]
RISC-V: Implement cbranch_all/any.
This implements the (cond_len_)cbranch_all/_any optabs for riscv and
adds a few tests. The patch requires a small vectorizer fix before
optabs take any effect.
* gcc.target/riscv/rvv/autovec/early-break-3.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-4.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-5.c: New test.
Robin Dapp [Fri, 12 Dec 2025 08:52:16 +0000 (09:52 +0100)]
vect: Use type precision in reduction epilogue [PR123097].
In the PR we extract non-existent bits/elements from a vector. This is
because we use TYPE_SIZE (vectype) for a boolean vector which returns 8
instead of 4 for RVV's vector (4) <signed-boolean:1>.
The patch uses TYPE_VECTOR_SUBPARTS instead and multiplies its result
with vector_element_bits to get the proper number of elements and size.
PR tree-optimization/123097
gcc/ChangeLog:
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Calculate vector size by number of elements * bit size per
element.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr123097-run.c: New test.
* gcc.target/riscv/rvv/autovec/pr123097.c: New test.
Robin Dapp [Fri, 19 Dec 2025 18:36:35 +0000 (11:36 -0700)]
[PATCH] testsuite: Check for effective-target ctz [PR123192].
Rainer reported that ctz-ch.c fails on several platforms.
This patch adds /* { dg-require-effective-target ctz } */
to the test which looks like it does the right thing.
[PR123223, LRA]: Fix ICE of GCC built with checking rtl
The latest PR55212 patch improving dealing with scratch pseudos does not
check reload rtx on reg when recognizing scratch pseudos. This
results in failure of GCC built with checking rtl.
gcc/ChangeLog:
PR rtl-optimization/123223
* lra-constraints.cc (match_reload, curr_insn_transform): Check
rtx on REG when testing scratch pseudos.
Jeff Law [Fri, 19 Dec 2025 17:13:24 +0000 (10:13 -0700)]
[committed] Improve shift loops on the H8
Inspired by Georg-Johann's work on the AVR to convert the shift loops to a
sentinel approach and a rough work week, I revisited the shift patterns on the
H8 to see if we could improve things on that port as well. It also serves as a
good verification that things are working in my environment.
The basic idea of Georg-Johann's patch is to clear the bits that are going to
be shifted away, then turn on a sentinel bit (the last shifted away bit). This
is done outside the loop. The loop then iterates until the sentinel bit shows
up in C. This eliminates decrementing the loop counter and better performance.
It turns out to be fairly easy to implement on the H8. The first
implementation did the clearing and setting in the most simplistic way
possible, but to avoid significant code size regressions the clearing and
setting really needed to be handled by output_logical_op which has several
short cuts. So a bit of adjustment was necessary to make output_logical_op
callable from other contexts.
Second the H8/S and newer parts have shift-by-2 instructions. These aren't
normally used in shift loops unless we're optimizing for size. This requires
slight adjustment of the sentinel location for odd shift counts. The residual
single bit shift for that case is handled outside the loop.
Otherwise it's an uneventful patch. My hope was that it will save a minuscule
amount of testing time as the H8 continues to be the slowest cross target for
testing. Hard to judge that right now -- while the latest run on the H8 was
about 30 minutes faster than any run in the last month, the machine was
unloaded for that run while it was fully loaded for the standard nightly runs.
If this even approaches 1% I'll jump for joy.
Anyway, tested on the H8 with no regressions. Given the H8 is a dead ISA with
very few users, I'm going to go ahead and commit even though we're in stage3.
gcc/
* config/h8300/h8300.cc (output_logical_op): Adjust last argument to
be a pattern, not an insn. Corresponding implementation changes.
(output_shift_loop): Extracted from output_a_shift and improved
to use a sentinel to indicate when to stop the loop.
(output_a_shift): Use output_shift_loop.
(compute_a_shift_length): Handle adjusted shift loop code.
* config/h8300/logical.md (logicals): Pass pattern to output_logical_op
rather then the full insn.
* config/h8300/h8300-protos.h (output_logical_op): Update prototype.
Jakub Jelinek [Fri, 19 Dec 2025 15:44:16 +0000 (16:44 +0100)]
c++: Suppress -Wreturn-type warnings for functions with failed assertions [PR91388]
This is something Jonathan has asked for recently. E.g. in the recent
libstdc++ r16-6177 random.tcc changes, there was
if constexpr (__d <= 32)
return __generate_canonical_any<_RealT, uint64_t, __d>(__urng);
else
{
#if defined(__SIZEOF_INT128__)
static_assert(__d <= 64,
"irregular RNG with float precision >64 is not supported");
return __generate_canonical_any<
_RealT, unsigned __int128, __d>(__urng);
#else
static_assert(false, "irregular RNG with float precision"
" >32 requires __int128 support");
#endif
}
and when we hit there the static_assert, we don't get just an error about
that, but also a -Wreturn-type warning in the same function because that
path falls through to the end of function without returning a value.
But a function with a failed static_assert is erroneous and will never
fall through to the end. We could treat failed static_assert in functions
as __builtin_unreachable (), but I think it doesn't matter where exactly
in a function static_assert(false); appears, so this patch just suppresses
-Wreturn-type warning in that function instead.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
PR c++/91388
* semantics.cc (finish_static_assert): Suppress -Wreturn-type warnings
in functions with failed assertions.
Tomasz Kamiński [Thu, 18 Dec 2025 13:14:54 +0000 (14:14 +0100)]
libstdc++: Use smallest possible integer for __generate_cannonical_any
If the span of the range R produced by uniform bit generator U passed to
generate_canonical is not power of two, we need to use algorithm that
requires computing power R^k that is greater than 2^d, where d is number
of digits in mantissa of _RealT. Previously we have used an integer type
that is has twice as many digits as d. This lead to situation that for
standard engines that produced such range (like std::minstd_rand0,
std::minstd_rand, std::ranlux24, ....) 256bit integer support was
required for 128bit floats. However, in this cases R^4 provides more
than d bits of precision, while requiring 124 bits.
We overestimate the number of required bits, by computing a value
l * bit_width(R) (log2(R) + 1), where l is value such that log2(R) * l >= d.
As R >= 2^log2(R), then R^l >= (2^log2(R))^l == 2^(log(R) * l) >= 2^d,
so k+1 >= l >= k. In consequence R^k is smaller R^l which require at most
l * bit_width(R). This is an overestimate, but difference should not be
higher than l bits.
We replace __gen_can_pow and __gen_can_rng_calls_needed with
__gen_canon_log(v, b), which computes the largest power of b that fits into v.
As such a number is smaller than v, the result will always fit in it's type.
Both the logarithm and the power value are returned using
__gen_canon_log_res struct.
libstdc++-v3/ChangeLog:
* include/bits/random.h (__rand_uint128::operator>)
(__rand_uint128::operator>=): Define.
* include/bits/random.tcc (__generate_canonical_pow2):
Adjust for use of __rand_uint128 in C++11.
(__gen_can_pow, __gen_can_rng_calls_needed): Replace with
__gen_canon_log.
(__gen_canon_log_res, __gen_canon_log): Define.
(__generate_canonical_any): Reworked how _UInt is determined.
* testsuite/26_numerics/random/uniform_real_distribution/operators/gencanon_eng.cc:
New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Thu, 18 Dec 2025 16:39:46 +0000 (16:39 +0000)]
libstdc++: Fix chrono::parse to read from wide strings [PR123147]
When we extract wide characters and insert them into a stringstream to
be parsed as a floating-point value, we should use a stringstream of
char, not wchar_t.
libstdc++-v3/ChangeLog:
PR libstdc++/123147
* include/bits/chrono_io.h (_Parser::operator()) <%S>: Use a
buffer of narrow characters to be parsed by std::from_chars.
* testsuite/std/time/parse/parse.cc: Check wchar_t parsing.
Tomasz Kamiński [Fri, 19 Dec 2025 12:07:00 +0000 (13:07 +0100)]
libstdc++: Make more _Safe_iterator functions in constexpr in C++20.
This functions are indirectly called from flat_ containers operations
(from preconditions check of lower_bound, upper_bound, ...) that were
made constexpr by r16-6026-gbf9dd44a97400e, leading to test with
in _GLIBCXX_DEBUG mode.
For __can_advance we uncoditionally return true in constant evaluation,
similary to __valid_range. The constexpr iterator will detect comparision
of iterator to different ranges.
libstdc++-v3/ChangeLog:
* include/debug/helper_functions.h (__gnu_debug::__can_advance):
Declare as _GLIBCXX20_CONSTEXPR.
* include/debug/safe_iterator.h (__gnu_debug::__can_advance):
Define as _GLIBCXX20_CONSTEXPR, and return true for constexpr
evaluation.
(__gnu_debug::__base): Define as _GLIBCXX20_CONSTEXPR.
Jason Merrill [Fri, 19 Dec 2025 08:20:13 +0000 (15:20 +0700)]
c++: lambda template arg in abbreviated template [PR117034]
A lambda used as a non-type template argument to a type-constraint in the
first parameter of an abbreviated function template was wrapped in a
TARGET_EXPR because we hadn't opened the implicit template scope yet. After
r16-5115 changed convert_template_argument to call
instantiate_non_dependent_expr_internal on non-dependent expressions, these
TARGET_EXPRs began hitting the default case in tsubst_expr, causing an ICE.
So let's enter implicit template scope as soon as we see the concept-name;
at that point we know we're declaring an abbreviated template.
PR c++/117034
gcc/cp/ChangeLog:
* parser.cc (maybe_start_implicit_template): Split out from...
(synthesize_implicit_template_parm): ...here.
(cp_parser_template_id): Call it.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-template-pr117034.C: New test.
Egas Ribeiro [Thu, 11 Dec 2025 13:30:49 +0000 (13:30 +0000)]
c++: Fix ICE with functional cast to reference in template [PR123044]
When processing a functional cast to a reference type in a template
context, build_functional_cast_1 wasn't calling convert_from_reference.
This left the expression with a reference type, which later triggered
an assertion in implicit_conversion from r15-6709 that expects
expression types to already be dereferenced.
In contrast, cp_build_c_cast already calls convert_from_reference on
the result in template contexts, so C-style casts like (R)x worked
correctly.
The fix makes functional casts consistent with C-style casts by
calling convert_from_reference before returning in the template
processing path.
PR c++/123044
gcc/cp/ChangeLog:
* typeck2.cc (build_functional_cast_1): Call convert_from_reference
on template CAST_EXPR to match C-style cast behavior.
gcc/testsuite/ChangeLog:
* g++.dg/template/implicit-func-cast.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@tecnico.ulisboa.pt> Reviewed-by: Jason Merrill <jason@redhat.com>
Egas Ribeiro [Sat, 13 Dec 2025 20:06:09 +0000 (20:06 +0000)]
c++: Fix injected-class-name lookup with multiple bases [PR122509]
When looking up an unqualified injected-class-name in a member access
expression (e.g., D().v<int>), cp_parser_lookup_name calls lookup_member
with protect=0, causing it to return NULL on ambiguity instead of the
candidate list. This prevented the existing DR 176 logic in
maybe_get_template_decl_from_type_decl from resolving the ambiguity.
Per DR 176, if all ambiguous candidates are instantiations of the same
class template and the name is followed by a template-argument-list,
the reference is to the template itself and is not ambiguous.
Fix by using protect=2 to return the ambiguous candidate list.
PR c++/122509
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lookup_name): Use protect=2 instead of
protect=0 when calling lookup_member.
gcc/testsuite/ChangeLog:
* g++.dg/tc1/dr176-2.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Tobias Burnus [Fri, 19 Dec 2025 11:07:58 +0000 (12:07 +0100)]
OpenMP: uses_allocators with ';'-separated list
OpenMP 6.0 has the following wording for the uses_allocators clause:
"More than one clause-argument-specification may be specified";
this permits ';' lists. While that's pointless for predefined
allocators, for user-defined allocators it saves redundant
') uses_allocators(' by permitting:
uses_allocators( traits(t1): alloc1 ; traits(t2): alloc2 )
Additionally, the order in the tree dump has been changed to
place the modifiers before the allocator variable, matching
the input syntax.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_uses_allocators): Accept
multiple clause-argument-specifications separated by ';'.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_uses_allocators): Accept
multiple clause-argument-specifications separated by ';'.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_clause_uses_allocators): Accept
multiple clause-argument-specifications separated by ';'.
gcc/ChangeLog:
* tree-pretty-print.cc (dump_omp_clause): For uses_allocators,
print modifier before allocator variable.
xtensa: Make the definition of xtensa_cstoresi_operator more appropriate
The description of 'cstoremode4' in gccint says:
"These operations may FAIL, but should do so only in relatively uncommon
cases; if they would FAIL for common cases involving integer comparisons,
it is best to restrict the predicates to not allow these operands."
-- 16.10 Standard Pattern Names For Generation, gccint (the latest)
Therefore, it is preferable to include unsigned comparisons in the operator
constraints of the pattern only if the machine instructions emitted by the
pattern require such comparisons.
gcc/ChangeLog:
* config/xtensa/predicates.md (xtensa_cstoresi_operator):
Change it to include unsigned comparisons only when TARGET_SALT is
enabled.
Jakub Jelinek [Fri, 19 Dec 2025 10:24:02 +0000 (11:24 +0100)]
c++: Fix stabilization of bitfields [PR122772]
The following testcase is rejected, because due to the C++17
b @= a ordering of side-effects cp_stabilize_reference is called
on the lhs of the compound assignment. For some cases
cp_stabilize_reference just uses stabilize_reference, but for other
cases it attempts to bind a reference to the expression.
This doesn't work for bit-fields and DECL_PACKED fields though,
we can't take address of a bit-field (nor DECL_PACKED field)
and error on that.
This patch introduces for this another wrapper around
stabilize_reference, which for clk_bitfield | clk_packed handles
some trees stabilize_reference doesn't handle correctly for C++,
and for the rest defers to stabilize_reference.
This way, we can introduce multiple SAVE_EXPRs (like stabilize_expr
itself already can as well), but can handle even the weirdest
lhs expressions for which lvalue_kind returns clk_bitfield or clk_packed
set.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
PR c++/122772
* tree.cc (cp_stabilize_bitfield_reference): New function.
(cp_stabilize_reference): Use it for stabilization of
clk_bitfield or clk_packed lvalues.
Jakub Jelinek [Fri, 19 Dec 2025 10:12:21 +0000 (11:12 +0100)]
c++, dwarf2out: Debug info for namespace scope structured bindings [PR122968]
As the following testcase shows, we weren't emitting any debug info for
namespace scope structured bindings (except for tuple based ones, those
worked fine).
There are multiple problems:
1) for tuple based structured bindings there is cp_finish_decl and the
VAR_DECLs are registered in varpool, but the other ones are just
VAR_DECLs with DECL_VALUE_EXPR for elements of the underlying VAR_DECL,
so they don't really appear in the IL; and we weren't calling
early_global_decl debug hook on those
2) fixing that makes those appear only with -fno-eliminate-unused-symbols,
because whether something is used is determined by the presence of
varpool node for it; I think varpool is unprepared to handle
DECL_VALUE_EXPR VAR_DECLs, those would appear always unused;
so, the patch instead adds mapping from the underlying VAR_DECL
to the structured bindings needed for debug info in an artificial
attribute and when marking the underlying VAR_DECL of a structured
binding as used, it marks all the structured bindings attached to it
as well
3) with this, the DW_TAG_variable DIEs for structured bindings appear even
without -fno-eliminate-unused-symbols, but they still don't have
locations; for that we need to arrange when we add DW_AT_location to the
underlying variable to also add DW_AT_location to the structured bindings
afterwards
Note, this patch doesn't improve the structured bindings bound to bitfields
case the PR was filed for originally, neither at namespace scope nor at
block scope. That will need to be handled incrementally.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
PR debug/122968
gcc/
* dwarf2out.cc (premark_used_variables): Handle "structured bindings"
attribute.
(dwarf2out_late_global_decl): Likewise.
gcc/cp/
* decl.cc (cp_finish_decomp): For structured bindings at namespace
scope which have DECL_HAS_VALUE_EXPR_P set, call early_global_decl
debug hook and put all such structured bindings into
"structured bindings" attribute arguments on the underlying decl.
gcc/testsuite/
* g++.dg/guality/decomp1.C: New test.
Jakub Jelinek [Fri, 19 Dec 2025 10:04:50 +0000 (11:04 +0100)]
++: Fix up cp_compare_floating_point_conversion_ranks for dfp [PR122834]
The following testcase ICEs in cp_compare_floating_point_conversion_ranks,
when it is called on one extended floating point type (right now always
a binary floating point type) and a decimal floating point type (which
we currently handle as neither standard nor extended floating point type,
similarly to e.g. __float128 and similar types).
There is an assertion that fails in that case.
When no extended floating point types are involved, e.g. common type
choice is quite arbitrary if TYPE_PRECISION is the same, e.g.
auto a = 0.0DL + 1.0Q;
auto b = 1.0Q + 0.0DL;
chooses the first type in both cases, so decltype (0.0DL) in the first
case and __float128 in the second case.
Now, when one type is extended floating point, I think we should follow
the C++23 rules, which say that conversion ranks are unordered if the
set of the values of both types are neither proper subsets nor supersets
of the other, which is I think the case of binary vs. decimal,
e.g. 0.3D{F,D,L} is not exactly representable in any binary floating point
format and I thought e.g. (1.0FNN + __FLTNN_EPSILON__) * __FLTNN_MIN__
is not representable in any decimal floating point. At least, for
_Float32, it needs 112 decimal digits to represent it exactly
0.00000000000000000000000000000000000001175494490952133940450443629595204006810278684798281709160328881985245648433835441437622648663818836212158203125
and _Decimal128 has only 34 significant digits, for _Float64
that is already 767 significant digits etc.
Though, _Float16 is a different case.
The following helper program:
int
main ()
{
char buf[256], *p, *q;
size_t l, ml = 0;
{
union { _Float16 x; unsigned short y; } u, v;
for (int i = 0; i < 0x7c00; ++i)
{
u.y = i;
_Float32 x = u.x;
strfromf32 (buf, 255, "%.254f", x);
for (p = buf; *p == '0' || *p == '.'; ++p)
;
if (*p == '\0')
continue;
for (q = strchr (p, '\0') - 1; *q == '0' || *q == '.'; --q)
;
q[1] = '\0';
l = strlen (p);
if (strchr (p, '.'))
--l;
if (ml < l)
ml = l;
}
}
printf ("%zd\n", ml);
ml = 0;
{
union { __bf16 x; unsigned short y; } u, v;
for (int i = 0; i < 0x7f80; ++i)
{
u.y = i;
_Float32 x = u.x;
strfromf32 (buf, 255, "%.254f", x);
for (p = buf; *p == '0' || *p == '.'; ++p)
;
if (*p == '\0')
continue;
for (q = strchr (p, '\0') - 1; *q == '0' || *q == '.'; --q)
;
q[1] = '\0';
l = strlen (p);
if (strchr (p, '.'))
--l;
if (ml < l)
ml = l;
}
}
printf ("%zd\n", ml);
}
prints
21
96
As _Decimal32 has 7 and _Decimal64 16 decimal digits, I think neither
_Float16 nor decltype (0.0bf16) is proper subset of values of those types,
but as _Decimal128 has 34 decimal digits, I'd say _Float16 is a proper
subset of _Decimal128 while decltype (0.0bf16) is not.
Example of the 21 decimal digits for _Float16 is
0x1.a3cp-14f16 (0x68f u.y), which is exactly 0.000100076198577880859375
2025-12-18 Jakub Jelinek <jakub@redhat.com>
PR c++/122834
* typeck.cc (cp_compare_floating_point_conversion_ranks): Return
3 if fmt2->b is 10 except for _Float16 vs. _Decimal128, in that
case return -2.
* g++.dg/dfp/pr122834-1.C: New test.
* g++.dg/dfp/pr122834-2.C: New test.
Jakub Jelinek [Fri, 19 Dec 2025 09:13:45 +0000 (10:13 +0100)]
c++: Reject array new with -fexceptions with deleted dtor [PR123030]
For array new and -fexceptions, we only try to build cleanup if
TYPE_HAS_NONTRIVIAL_DESTRUCTOR and so don't complain if the
array element has trivial but deleted destructor.
The following patch changes it to build the dtor whenever
type_build_dtor_call but only registers it as cleanup if the cleanup
has TREE_SIDE_EFFECTS. build_vec_delete_1 has a special
case for these type_build_dtor_call && !TYPE_HAS_NONTRIVIAL_DESTRUCTOR
cases where it does less work.
Though, I wonder if we also shouldn't test whether the ctor isn't noexcept,
then we wouldn't have to change the new4.C test. Though, clang++ rejects
that as well even when it has noexcept ctor.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
PR c++/123030
* init.cc (build_vec_init): Call build_vec_delete_1 for -fexceptions
even if just type_build_dtor_call, not only when
TYPE_HAS_NONTRIVIAL_DESTRUCTOR. But register cleanups only
for TYPE_HAS_NONTRIVIAL_DESTRUCTOR.
* g++.dg/cpp0x/deleted18.C: New test.
* g++.dg/cpp0x/new4.C: Expect an error.
Jakub Jelinek [Fri, 19 Dec 2025 09:12:06 +0000 (10:12 +0100)]
c++: Fix up expansion statement handling
I've noticed that in many spots of the expansion statement handling I've
handled incorrectly the creation of VAR_DECLs which are constexpr
in the spec (or can be constexpr when user writes it that way).
All I've done was set DECL_DECLARED_CONSTEXPR_P and TREE_READONLY
flags on the VAR_DECL, but haven't made sure the TREE_TYPE is const
qualified as well (with the exception of references obviously).
Haven't touched spots which are always references, e.g. when it is
constexpr auto &&var etc.
Fixing this revealed some problems:
1) one fixed by first hunk in pt.cc, where the i variable was created
with get_target_expr and thus now is const as well and so operator++
on it doesn't work; used build_target_expr_with_type to make it
non-const
2) several tests got it wrong and didn't actually support calling
operator *, operator != and operator + on const objects; fixed by
making those operators const qualified.
2025-12-19 Jakub Jelinek <jakub@redhat.com>
* parser.cc (cp_build_range_for_decls): If expansion_stmt_p,
where we are setting DECL_DECLARED_CONSTEXPR_P on begin/end, use
const qualified iter_type.
* pt.cc (finish_expansion_stmt): Use build_target_expr_with_type
with cv_unqualified to create it instead of get_target_expr to
make it non-const qualified. When creating VAR_DECLs with
DECL_DECLARED_CONSTEXPR_P, make sure they have const qualified
type unless they are references.
Alexandre Oliva [Fri, 19 Dec 2025 07:57:05 +0000 (04:57 -0300)]
[lra] take scratch as implicit unused output reloads [PR55212]
When trying to convert the SH port to use LRA, the first issue I hit
was the need for dealing with former scratch registers at places we
didn't need to on other ports, treating them like unused output
reloads instead of rejecting them.
for gcc/ChangeLog
PR target/55212
* lra-constraints.cc (match_reload): Treat former scratch
regs as implicit unused output reloads.
(process_alt_operands): Likewise.
(curr_insn_transform): Likewise.
Nathaniel Shead [Sat, 6 Dec 2025 05:47:18 +0000 (16:47 +1100)]
c++/modules: Reattempt to complete ARRAY_TYPEs after reading a cluster [PR122922]
The PR raises an issue where we complain about value-initializing an
incomplete array type, where the element type is a complete type.
Here, the friend declaration brings TTensor<0> and TTensor<1> into the
same cluster, and we have no intra-cluster ordering that ensures
TTensor<0>'s definition is streamed before TTensor<1>'s definition is.
In general we don't currently do any ordering of definitions, we only
reorder in cases that a declaration depends on another.
In this particular case we happen to stream TTensor<1>'s definition
first, which builds an array type of TTensor<0>. At this point
TTensor<0>'s definition hasn't been streamed, so the array is considered
to be an array of incomplete type. Later we do stream TTensor<0>'s
definition, but we don't update the TYPE_SIZE etc. of the array type we
built earlier so build_value_init thinks we still have incomplete type
and errors.
Some possible approaches:
1. Have some post-processing for arrays of incomplete type during module
streaming; once we've finished reading the cluster we can loop
through those array types and attempt to complete them.
2. Add a dependency ordering between structs that have a field that's a
non-dependent array type of a different struct in the same cluster,
so that the latter is always streamed first. We shouldn't see cycles
because we cannot have two structs with arrays of each other. This
would require processing definitions though and I'm not convinced
this necessarily would fix the issue in all cases.
3. Add more calls to 'complete_type' when processing structure fields,
rather than assuming that if we have a complete record type all its
fields must also have been completed already. This seems error-prone
though, as we may miss cases. Unless perhaps we replace uses of
COMPLETE_TYPE_P entirely in the C++ frontend with a function that
attempts to complete the type and returns false if it failed?
This patch takes approach #1 as a minimal fix, but maybe it would be
worth exploring other approaches later.
PR c++/122922
gcc/cp/ChangeLog:
* module.cc (trees_in::post_types): New member.
(trees_in::trees_in): Initialize it.
(trees_in::~trees_in): Clean it up.
(trees_in::post_process_type): New functions.
(trees_in::tree_node): Save incomplete ARRAY_TYPEs for later
post-processing.
(module_state::read_cluster): Attempt to complete any
ARRAY_TYPEs we saved earlier.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr122922_a.C: New test.
* g++.dg/modules/pr122922_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Thu, 4 Dec 2025 14:14:36 +0000 (01:14 +1100)]
c++/modules: Don't build STAT_HACKs for current TU namespace bindings [PR122995]
The issue in the PR is that we're checking if the binding entity for the
current TU matches the namespace we're pushing. In this case the slot
however is a STAT_HACK we created during 'maybe_record_mergeable_decl'
to indicate that the binding entity contains a global module binding.
Adding '|| (STAT_HACK_P ((tree) slot) && STAT_DECL ((tree) slot) == ns)'
should fix the assertion, but I think we want to just not build the
STAT_HACK for namespaces, as they'll always be global module regardless,
and cannot match with any other declaration, so there's no need for the
special flag.
PR c++/122995
gcc/cp/ChangeLog:
* name-lookup.cc (maybe_record_mergeable_decl): Don't build a
STAT_HACK for namespaces.
gcc/testsuite/ChangeLog:
* g++.dg/modules/namespace-17_a.C: New test.
* g++.dg/modules/namespace-17_b.C: New test.
Lewis Hyatt [Tue, 16 Dec 2025 05:15:14 +0000 (00:15 -0500)]
configure: Support disabling specific languages [PR12407]
Sometimes it can be desirable to get the semantics of
--enable-languages=all, but to exclude one or more languages from the
build. Currently this is not directly supported; the best you can do is to
list the ones you do want to be built as arguments to --enable-languages.
In addition to being inconvenient, this also complicates cross-platform
portability, since --enable-languages=all carries the useful semantics that
unsupported languages will be skipped automatically; by contrast, languages
listed explicitly as arguments to --enable-languages will produce a hard
error if they are not supported.
This patch extends the syntax of --enable-languages so that, e.g.:
--enable-languages=all,^xyz,^abc
would build every supported language other than xyz and abc.
ChangeLog:
PR bootstrap/12407
* configure.ac: Add feature to parsing of --enable-languages so that
a language can be disabled by prefixing it with a caret.
* configure: Regenerate.
gcc/ChangeLog:
PR bootstrap/12407
* doc/install.texi (--enable-languages): Document the new language
exclusion feature.
LIU Hao [Wed, 3 Dec 2025 03:10:46 +0000 (11:10 +0800)]
libstdc++: On Windows, retrieve thread-local variables via functions
For Windows, GCC can be configured with `--enable-tls` to enable native TLS.
The native TLS implementation has a limitation that it is incapable of
exporting thread-local variables from DLLs. Therefore, they are retrieved
via getter functions instead.
libstdc++-v3/ChangeLog:
* config/os/mingw32-w64/os_defines.h (_GLIBCXX_NO_EXTERN_THREAD_LOCAL):
New macro.
* include/std/mutex [_GLIBCXX_NO_EXTERN_THREAD_LOCAL]
(__get_once_callable, __get_once_call): Declare new functions.
* src/c++11/mutex.cc [_GLIBCXX_NO_EXTERN_THREAD_LOCAL]
(__get_once_callable, __get_once_call): Define.
Signed-off-by: LIU Hao <lh_mouse@126.com> Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Add the assumption clause 'no_openmp_constructs' (which as most assumption
clauses is ignored in the front end - for now).
For Fortran, improve free-form parsing of argument-free clauses
by avoiding substring matches.
Patrick Palka [Thu, 18 Dec 2025 19:24:54 +0000 (14:24 -0500)]
c++: restore printing 'typename' for none_type tag
The drive-by change in r16-6144 to have dump_type not print any tag name
for none_type TYPENAME_TYPE caused some diagnostic regressions in C++20
mode where an expected 'typename' is now missing in the error message:
g++.dg/concepts/diagnostic5.C (test for warnings, line 5)
g++.old-deja/g++.pt/typename3.C (test for errors, line 20)
g++.old-deja/g++.pt/typename4.C (test for errors, line 25)
g++.old-deja/g++.pt/typename6.C (test for errors, line 18)
The first test shows we no longer print 'typename' when diagnosing a
failed type-requirement:
• in requirements [with T = char]
gcc/testsuite/g++.dg/concepts/diagnostic5.C:5:16:
5 | concept c1 = requires { typename T::blah; };
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
• the required type ‘T::blah’ is invalid
which is undesirable since 'typename' has been explicitly written.
(The other three tests are less interesting since they're error-recovery.)
The TYPENAME_TYPE in question is being built with a none_type tag due
to cp_parser_type_name passing tag_type=none_type to cp_parser_class_name.
This seems wrong at least when typename_keyword_p=true. But rather than
continue messing with this old and delicate part of the parser, this patch
just restores printing the 'typename' prefix for none_type TYPENAME_TYPEs.
gcc/cp/ChangeLog:
* decl.cc (tag_name) <case none_type>: Return "typename" as if
typename_type.
This change causes ICE e.g. on the following testcase.
The problem is that build_typename_type expects IDENTIFIER_NODE as the
second argument, e.g. it uses it as
tree d = build_decl (input_location, TYPE_DECL, name, t);
argument. But TYPE_NAME doesn't have to be an IDENTIFIER_NODE, it can
be a TYPE_DECL too and when we build a TYPE_DECL with TYPE_DECL as
DECL_NAME, it breaks all kinds of assumptions everywhere in the FE as well
as middle-end.
Fixed by using TYPE_IDENTIFIER instead.
2025-12-18 Jakub Jelinek <jakub@redhat.com>
PR c++/123186
* parser.cc (cp_parser_template_id): Use TYPE_IDENTIFIER instead of
TYPE_NAME in second build_typename_type argument.
Egas Ribeiro [Sat, 13 Dec 2025 13:14:47 +0000 (13:14 +0000)]
c++: Fix ICE with type aliases in inherited CTAD [PR122070]
When processing inherited CTAD in C++23, type_targs_deducible_from can
be called with a synthetic alias template whose TREE_VALUE is a type
alias. Since TYPE_TEMPLATE_INFO_MAYBE_ALIAS can return NULL for type
aliases, we need to fall back to TYPE_TEMPLATE_INFO to get the template
info of the underlying type before calling TI_TEMPLATE, which should
always be non-NULL when called from inherited_ctad_tweaks.
PR c++/122070
gcc/cp/ChangeLog:
* pt.cc (type_targs_deducible_from): Fall back to
TYPE_TEMPLATE_INFO when TYPE_TEMPLATE_INFO_MAYBE_ALIAS is NULL.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/class-deduction-inherited10.C: New test.
* g++.dg/cpp23/class-deduction-inherited9.C: New test.
Signed-off-by: Egas Ribeiro <egas.g.ribeiro@gmail.com> Co-authored-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Jonathan Wakely [Thu, 18 Dec 2025 10:38:31 +0000 (10:38 +0000)]
libstdc++: Fix ranges::stable_sort handling of null buffer [PR123180]
The logic of the null pointer check got reversed when converting the
std::stable_sort code for ranges::stable_sort.
libstdc++-v3/ChangeLog:
PR libstdc++/123180
* include/bits/ranges_algo.h (__stable_sort_fn::operator()): Fix
sense of null check. Replace typedef with alias-declaration.
* testsuite/25_algorithms/stable_sort/123180.cc: New test.
This patch enables ACLE macro __ARM_FEATURE_SVE_PREDICATE_OPERATORS to indicate
that C/C++ language operations are available natively on SVE ACLE type svbool_t.
* gcc.target/aarch64/sve/acle/general/attributes_1.c: Update test for
__ARM_FEATURE_SVE_PREDICATE_OPERATORS.
* gcc.target/aarch64/sve/acle/general/attributes_9.c: New.
Given that profile probability is computed as an unsigned integer
value in the [0, max_probability = (uint32_t) 1 << (n_bits - 2)]
range (as opposed to a [0, 1] float), 50/50 likeihoods are encoded as
`even()', mapping to `max_probability / 2'.
The previous use of 0.5 for an even probability was, as a consequence
of the implicit `double' -> `uint32_t' conversion, silently set to 0 by
GCC when not using the `-Wconversion' flag.
We therefore replace the erroneous `probability (0.5, GUESSED)'
initialization with its correct `profile_probability::even ()'
counterpart.
gcc/ChangeLog:
PR tree-optimization/123153
* tree-vect-loop-manip.cc
(slpeel_tree_duplicate_loop_to_edge_cfg): use
profile_probability::even () for even likelihood.
Jonathan Wakely [Wed, 17 Dec 2025 18:36:36 +0000 (18:36 +0000)]
libstdc++: Fix up std::generate_canonical for 32-bit arches
Make use of __detail::_Select_uint_least_t<d>::type for
std::generate_canonical, so that we choose an appropriately sized
integer based on the number of bits needed, and so we have a 128-bit
integer type even on 32-bit targets (via the new __rand_uint128 class).
libstdc++-v3/ChangeLog:
* include/bits/random.tcc (__generate_canonical_pow2): Adjust
comments. Remove _UInt template parameter and define it in the
body using _Select_uint_least_t<__d>. Remove popcount call for
getting the width of the _UInt type. Cast floating-point
literal to _RealT.
(__generate_canonical_any): Remove _UInt template parameter and
define it in the body using _Select_uint_least_t<__d * 2>. Use
direct-initialization for _UInt variables. Cast floating-point
literal to _RealT.
(generate_canonical): Remove unused typedef. Remove constexpr-if
branches and remove unsigned type from template argument lists.
Co-authored-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Nathan Myers <nmyers@redhat.com>
Alfie Richards [Wed, 29 Oct 2025 13:29:10 +0000 (13:29 +0000)]
aarch64: Add new target options for 2024 Architecture Extension and Armv9.6-A
This does not add support for these version (and the corresponding
__ARM_FEATURE_<X> macros aren't implemented for this reason) but
accepts the command line strings and allows these to be passed on to
the assembler.
Armv9.6-A is supported by the new "armv9.6-a" option and defined as
"armv9.5-a+cmpbr+lsui+occmo"
Alfie Richards [Wed, 29 Oct 2025 13:29:10 +0000 (13:29 +0000)]
aarch64: Split sve2-X extensions into sve2 + sve-X extension.
Changes the "sve2-sm4", "sve2-sha3", "sve2-bitperm", and "sve2-aes"
to be aliases which imply both "sve2" and the new option "sve-sm4",
"sve-sha3", "sve-bitperm", or "sve-aes" respectively.
The EXPLICIT_OFF values are chosen to preserve the existing behaviour of
+nosve2-X.
This granularity is needed to model the 2024 Architecture Extensions
dependencies.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def
(sve-aes): New cli extension option.
(sve2-aes): Changed to be alias of sve2+sve-aes2.
(sve-bitperm): New cli extension option.
(sve2-bitperm): Changed to be alias of sve2+sve-bitperm.
(sve-sm4): New cli extension option.
(sve2-sm4): Changed to be alias of sve2+sve-sm4.
(sve-sm4): New cli extension option.
(sve2-sm4): Changed to be alias of sve2+sve-sm4.
* config/aarch64/aarch64.h (TARGET_SVE2_AES): Updated to require
sve2+sve-aes.
(TARGET_SVE2_BITPERM): Updated to require sve2+sve-bitperm.
(TARGET_SVE2_SHA3): Updated to require sve2+sve-sha3.
(TARGET_SVE2_SM4): Updated to require sve2+sve-sm4
* config/aarch64/aarch64-sve-builtins-sve2.def: Update gating for sve2-X
intrinsics.
Alfie Richards [Wed, 12 Nov 2025 16:30:45 +0000 (16:30 +0000)]
aarch64: Add alias option support
Adds the AARCH64_OPT_EXTENSION_ALIAS macro to aarch64-option-extensions.def
to define architecture features which gate no features themselves, but
act as aliases for other features.
When getting the extension string for some architecture flags, alias features
should be used over their constituent features, even if some of the constituent
features are enabled transitively by other features.
Changes +crypto option to use this macro.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(struct aarch64_extension_info): Add flags_alias_preferred_over.
(AARCH64_OPT_EXTENSION): Add setting flags preferred over to 0.
(AARCH64_OPT_EXTENSION_ALIAS): New macro def.
(aarch64_get_extension_string_for_isa_flags): Update to use alias
extensions over constituent extensions.
* config/aarch64/aarch64-feature-deps.h (alias_flags): New variable
(AARCH64_OPT_EXTENSION): New macro def.
(AARCH64_OPT_EXTENSION_ALIAS): New macro def.
(HANDLE): Update to use alias_flags.
(AARCH64_CORE): Update to use alias_flags.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION_ALIAS): New macro def to
define alias_prefer_over_flags_X.
(crypto): Update to us AARCH64_OPT_EXTENSION_ALIAS.
chenxiaolong [Thu, 11 Dec 2025 02:49:05 +0000 (10:49 +0800)]
LoongArch: Add support for the TARGET_MODES_TIEABLE_P vectorization type.
v1->v2:
Add the TARGET_MODES_TIEABLE_P function description and analyze the
reasons for the cost change of Subreg type rtx after supporting
vectorization.
This hook returns true if a value of mode mode1 is accessible in mode
mode2 without copying. On LA, for vector types V4SF and V8SF, the lower
128 bit data can be shared. After adding vector support in this hook,
the cost of type conversion for the subreg operation from the V4SF to
the V8SF registers can be made zero, and some rtx optimization
operations can be completed in the combine traversal. The comparison
of the backend support vectors before and after is as follows:
* config/loongarch/loongarch.cc (loongarch_modes_tieable_p):
Add support for vector conversion.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/vect-extract-256-128.c:
After supporting the vectorized type corresponding to subreg in
the backend, the cost of rtx becomes 0. In fwprop1 pass,
memory-loaded rtx cannot be propagated to this insn, which leads
to xvld not being optimized into vld instructions.
* gcc.target/loongarch/vect-mode-tieable.c: New test.
Richard Biener [Wed, 17 Dec 2025 13:38:23 +0000 (14:38 +0100)]
c/123156 - overflow in shuffle mask for __builtin_shufflevector
At some point the permute vector element type had to match the value
elemnt in size which easily leads to overflow for char element types
as shown in the testcase. This was relaxed for constant permute
masks, so use ssizetype.
PR c/123156
gcc/c-family/
* c-common.cc (c_build_shufflevector): Use ssizetype for the
permute vector element type.
gcc/testsuite/
* gcc.dg/torture/builtin-shufflevector-pr123156.c: New testcase.