Tom Tromey [Tue, 23 Sep 2025 15:36:47 +0000 (09:36 -0600)]
ada: Add Visitor generic to Repinfo
For a gnat-llvm debuginfo patch, it was convenient to be able to
inspect the expressions created during back-annotation. This patch
adds a new generic Visit procedure that can be implemented to allow
such inspection. List_GCC_Expression is reimplemented in terms of
this procedure as a proof of concept.
gcc/ada/ChangeLog:
* repinfo.adb (Visit): New procedure.
(List_GCC_Expression): Rewrite.
* repinfo.ads (Visit): New generic procedure.
VADS inline assembly works by using a qualified expression for one of
the types defined in the Machine_Code package, e.g.
procedure P is
begin
code_2'(INSTR, OPERAND1, OPERAND2);
end y;
This is different from GNAT's own inline assembly machinery, which
instead expects a call to Machine_Code.ASM with a set of
differently-typed arguments.
This incompatibility is preventing GNATSAS' GNAT-Warnings engine from
analyzing VADS code, hence we adapt sem_ch13.adb to not fail on such
constructs when GNAT is running under both Check_Semantics_Only_Mode and
Relaxed_RM_Semantics mode.
gcc/ada/ChangeLog:
* sem_ch13.adb (Analyze_Code_Statement): Do not emit error
message when only checking relaxed semantics.
Eric Botcazou [Fri, 14 Nov 2025 21:39:51 +0000 (22:39 +0100)]
ada: Streamline processing for shared passive and protected objects
The Add_Shared_Var_Lock_Procs procedure in Exp_Smem contains a very ad-hoc
management of transient scopes, which is probably unavoidable but can be
streamlined by changing the insertion point of the finalizer to be the one
used in the presence of controlled objects.
However, the latter change badly interacts with the special processing of
protected subprogram bodies implemented in Build_Finalizer_Call. Now this
processing is obsolete after the recent overhaul of the expansion of these
protected subprogram bodies and can be entirely removed.
No functional changes.
gcc/ada/ChangeLog:
* exp_ch7.adb (Build_Finalizer_Call): Delete.
(Build_Finalizer): Always insert the finalizer at the end of the
statement list in the non-package case.
(Expand_Cleanup_Actions): Attach the finalizer manually, if any.
* exp_smem.adb (Add_Shared_Var_Lock_Procs): Insert all the actions
directly in the transient scope.
Eric Botcazou [Sun, 16 Nov 2025 13:29:39 +0000 (14:29 +0100)]
ada: Couple of small and unrelated cleanups
No functional changes.
gcc/ada/ChangeLog:
* exp_ch11.adb (Expand_N_Handled_Sequence_Of_Statement): Merge the
eslif condition with the if condition for cleanup actions.
* sem_ch6.adb (Analyze_Procedure_Call.Analyze_Call_And_Resolve): Get
rid of if statement whose condition is always true.
* sinfo.ads (Finally_Statements): Document their purpose.
Eric Botcazou [Mon, 17 Nov 2025 07:45:21 +0000 (08:45 +0100)]
ada: Streamline implementation of masters in Exp_Ch9
The incidental discovery of an old issue and its resolution has exposed the
convoluted handling of masters in Exp_Ch9, which uses two totally different
approaches to achieve the same goal, respectively in Build_Master_Entity and
Build_Class_Wide_Master, the latter being quite hard to follow. The handling
of activation chains for extended return statements is also a bit complex.
This gets rid of the second approach entirely for masters, as well as makes
the handling of activation chains uniform for all nodes.
No functional changes.
gcc/ada/ChangeLog:
* gen_il-gen-gen_nodes.adb (N_Extended_Return_Statement): Add
Activation_Chain_Entity semantic field.
* exp_ch3.adb (Build_Master): Use Build_Master_{Entity,Renaming} in
all cases.
(Expand_N_Object_Declaration): Small tweak.
* exp_ch6.adb (Make_Build_In_Place_Iface_Call_In_Allocator): Use
Build_Master_{Entity,Renaming} to build the master.
* exp_ch7.adb (Expand_N_Package_Declaration): Do not guard the call
to Build_Task_Activation_Call for the sake of consistency.
* exp_ch9.ads (Build_Class_Wide_Master): Delete.
(Find_Master_Scope): Likewise.
(Build_Protected_Subprogram_Call_Cleanup): Move to...
(First_Protected_Operation): Move to...
(Mark_Construct_As_Task_Master): New procedure.
* exp_ch9.adb (Build_Protected_Subprogram_Call_Cleanup): ...here.
(First_Protected_Operation): ...here.
(Build_Activation_Chain_Entity): Streamline handling of extended
return statements.
(Build_Class_Wide_Master): Delete.
(Build_Master_Entity): Streamline handling of extended return
statements and call Mark_Construct_As_Task_Master on the context.
(Build_Task_Activation_Call): Assert that the owner is not an
extended return statement.
(Find_Master_Scope): Delete.
(Mark_Construct_As_Task_Master): New procedure.
* sem_ch3.adb (Access_Definition): Use Build_Master_{Entity,Renaming}
in all cases to build a master.
* sem_ch6.adb (Check_Anonymous_Return): Rename to...
(Check_Anonymous_Access_Return_With_Tasks): ...this. At the end,
call Mark_Construct_As_Task_Master on the parent node.
(Analyze_Subprogram_Body_Helper): Adjust to above renaming.
(Create_Extra_Formals): Do not set Has_Master_Entity here.
* sinfo.ads (Activation_Chain_Entity): Adjust description.
Bob Duff [Fri, 14 Nov 2025 21:29:45 +0000 (16:29 -0500)]
ada: VAST found bug: Missing Parent in annotate aspect
In case of an Annotate aspect of the form "Annotate => Expr",
where Expr is an identifier (as opposed to an aggregate),
the Parent field of the N_Identifier node for Expr was
destroyed. This patch changes the code that turns the aspect
into a pragma, so that it no longer has that bug.
The problem was in "New_List (Expr)"; which sets the Parent of
Expr to Empty. But Expr is still part of the tree of the aspect,
so it should have a proper Parent; we can't just stick it in a
temporary list.
The new algorithm constructs the pragma arguments without disturbing
the tree of the aspect.
This is the last known case of missing Parent fields, so we can
now enable the VAST check that detected this bug.
gcc/ada/ChangeLog:
* sem_ch13.adb (Aspect_Annotate): Avoid disturbing the tree of the
aspect.
* vast.adb: Enable Check_Parent_Present.
* exp_ch6.adb (Validate_Subprogram_Calls): Minor reformatting.
Eric Botcazou [Thu, 13 Nov 2025 20:12:54 +0000 (21:12 +0100)]
ada: Fix fallout of recent finalization fix for limited types
The recent finalization fix made for limited types has uncovered cases where
the object returned by calls to build-in-place functions was not finalized
in selected anonymous contexts, most notably the dependent expressions of
conditional expressions. The specific finalization machinery that handles
conditional expressions requires the temporaries built for their dependent
expressions to be visible as early as possible, and this was not the case.
gcc/ada/ChangeLog:
* exp_ch4.adb (Expand_N_Case_Expression): When not optimizing for a
specific context, call Make_Build_In_Place_Call_In_Anonymous_Context
on expressions of alternatives when they are calls to BIP functions.
(Expand_N_If_Expression): Likewise for the Then & Else expressions.
Bob Duff [Thu, 13 Nov 2025 16:40:55 +0000 (11:40 -0500)]
ada: VAST: Check basic tree properties
Miscellaneous improvements to VAST. Mostly debugging improvements.
Move the call to VAST from Frontend to Gnat1drv, because
there is code AFTER the call to Frontend that notices
certain errors, and disables the back end. We want VAST
to be enabled only when the back end will be called.
This is needed to enable Check_Error_Nodes, among other
things.
gcc/ada/ChangeLog:
* frontend.adb: Move call to VAST from here...
* gnat1drv.adb: ...to here.
* vast.ads (VAST_If_Enabled): Rename main entry point of VAST from
VAST to VAST_If_Enabled.
* vast.adb: Miscellaneous improvements. Mostly debugging
improvements. Also enable Check_Error_Nodes. Also add checks:
Check_FE_Only, Check_Scope_Present, Check_Scope_Correct.
* debug.ads: Minor comment tweaks. The comment, "In the checks off
version of debug, the call to Set_Debug_Flag is always a null
operation." appears to be false, so is removed.
* debug.adb: Minor: Remove some code duplication.
* sinfo-utils.adb (nnd): Add comment warning about C vs. Ada
confusion.
Eric Botcazou [Thu, 13 Nov 2025 08:16:52 +0000 (09:16 +0100)]
ada: Fix missing activation of task returned through class-wide type
This fixes an old issue whereby a task returned through the class-wide type
of a limited record type is not activated by the caller, because it is not
moved onto the activation chain that the caller passes to the function.
gcc/ada/ChangeLog:
* exp_ch6.ads (Needs_BIP_Task_Actuals): Adjust description.
* exp_ch6.adb (Expand_N_Extended_Return_Statement): Move activation
chain for every build-in-place function with task formal parameters
when the type of the return object might have tasks.
A recent patch made Multi_Module_Symbolic_Traceback have two consecutive
formal parameters of type Boolean, which opens the door for mixing up
actual parameters in calls. And that mistake was actually made in a call
introduced by the same patch.
This commit fixes the call and also introduces a new enumerated type to
make this kind of mistake less likely in the future.
gcc/ada/ChangeLog:
* libgnat/s-dwalin.ads (Display_Mode_Type): New enumerated type.
(Symbolic_Traceback): Use new type in profile.
* libgnat/s-dwalin.adb (Symbolic_Traceback): Use new type in profile
and adapt body.
* libgnat/s-trasym__dwarf.adb (Multi_Module_Symbolic_Traceback): Fix
wrong call in body of one overload. Use new type in profile. Adapt
body.
(Symbolic_Traceback, Symbolic_Traceback_No_Lock,
Module_Symbolic_Traceback): Use new type in profile and adapt body.
(Calling_Entity): Adapt body.
Jakub Jelinek [Thu, 27 Nov 2025 12:55:17 +0000 (13:55 +0100)]
bitint: Fix up big-endian handling in limb_access [PR122714]
The bitint_extended changes in limb_access broke bitint_big_endian.
As we sometimes (for bitint_extended) access the MEM_REFs using
atype rather than m_limb_type, for big-endian we need to adjust
the MEM_REFs offset if atype has smaller TYPE_SIZE_UNIT than m_limb_size.
2025-11-27 Jakub Jelinek <jakub@redhat.com>
PR target/122714
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Adjust
MEM_REFs offset for bitint_big_endian if ltype doesn't have the
same byte size as m_limb_type.
Richard Biener [Thu, 27 Nov 2025 09:56:43 +0000 (10:56 +0100)]
Fix OMP SIMD clone mask record/get again
Post-CI checkin detected aarch64 fallout for the last change. AARCH64
has ABI twists that run into a case where an unmasked call when loop
masked allows for a mask that has different shape than that of the
return value which in turn has different type than that of an actual
argument.
While we do not support a mismatch of call mask shape with the
OMP SIMD ABI mask shape when there's no call mask we have no such
restriction.
So the following fixes the record/get of a loop mask in the unmasked
call case, also fixing a latent issue present before. In particular
do not record a random scalar operand as representing the mask.
A testcase is in gcc.target/aarch64/vect-simd-clone-4.c.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Fix
recording of the mask type again. Adjust placing of
mask arguments for non-masked calls.
Dhruv Chawla [Thu, 27 Nov 2025 11:12:33 +0000 (12:12 +0100)]
remove patterns for (y << x) {<,<=,>,>=} x [PR122733]
These patterns should not be in match.pd as they require range
information checks that ideally belong in VRP. They were also causing
breakages as the checks weren't tight enough.
PR tree-optimization/122733
* match.pd ((y << x) {<,<=,>,>=} x): Remove.
((y << x) {==,!=} x): Call constant_boolean_node instead of
build_one_cst/build_zero_cst and combine into one pattern.
* gcc.dg/match-shift-cmp-1.c: Update test to only check
equality.
* gcc.dg/match-shift-cmp-2.c: Likewise.
* gcc.dg/match-shift-cmp-3.c: Likewise.
* gcc.dg/match-shift-cmp-4.c: Removed.
Jakub Jelinek [Thu, 27 Nov 2025 10:57:02 +0000 (11:57 +0100)]
fold-const, match.pd: Pass stmt to expr_not_equal if possible
The following patch is a small extension of the previous patch to pass stmt
context to the ranger queries from match.pd where possible, so that we can
use local ranges on a particular statement rather than global ones.
expr_not_equal_to also uses the ranger, so when possible this passes it
the statement context.
2025-11-27 Jakub Jelinek <jakub@redhat.com>
* fold-const.h (expr_not_equal_to): Add gimple * argument defaulted
to NULL.
* fold-const.cc (expr_not_equal_to): Likewise, pass it through to
range_of_expr.
* generic-match-head.cc (gimple_match_ctx): New static inline.
* match.pd (X % -Y -> X % Y): Capture NEGATE and pass
gimple_match_ctx (@2) as new 3rd argument to expr_not_equal_to.
((A * C) +- (B * C) -> (A+-B) * C): Pass gimple_match_ctx (@3)
as new 3rd argument to expr_not_equal_to.
(a rrotate (bitsize-b) -> a lrotate b): Likewise.
On Wed, Nov 26, 2025 at 09:52:50AM +0100, Richard Biener wrote:
> I wonder if it makes sense to wrap
> get_range_query (cfun)->range_of_expr (r, @0, gimple_match_ctx (@4))
> into sth like gimple_match_range_of_expr (r, @0, @4)?
It does make sense, so the following patch implements that.
Note, gimple-match.h is a bad location for that helper, because
lots of users use it without having value-range.h included and
it is for APIs to use the gimple folders, not for match.pd helpers
themselves, so I've moved there gimple_match_ctx as well.
2025-11-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119683
* gimple-match.h (gimple_match_ctx): Move to ...
* gimple-match-head.cc (gimple_match_ctx): ... here. Make static.
(gimple_match_range_of_expr): New static inline.
* match.pd ((mult (plus:s (mult:s @0 @1) @2) @3)): Use
gimple_match_range_of_expr.
((plus (mult:s (plus:s @0 @1) @2) @3)): Likewise.
((t * u) / u -> t): Likewise.
((t * u) / v -> t * (u / v)): Likewise.
((X + M*N) / N -> X / N + M): Likewise.
((X - M*N) / N -> X / N - M): Likewise.
((X + C) / N -> X / N + C / N): Likewise.
(((T)(A)) + CST -> (T)(A + CST)): Likewise
(x_5 == cstN ? cst4 : cst3): Likewise. Do r.set_varying
even when gimple_match_range_of_expr failed.
Richard Biener [Thu, 27 Nov 2025 09:04:19 +0000 (10:04 +0100)]
tree-optimization/122885 - avoid re-using accumulator for some bool vectors
When boolean vectors do not use vector integer modes we are not
set up to produce the partial epilog in a correctly typed way,
so avoid this situation. For the integer mode case we are able
to pun things correctly, so keep that working.
PR tree-optimization/122885
* tree-vect-loop.cc (vect_find_reusable_accumulator): Reject
mask vectors which do not use integer vector modes.
(vect_create_partial_epilog): Assert the same.
Jonathan Wakely [Sat, 15 Nov 2025 18:19:28 +0000 (18:19 +0000)]
libstdc++: Future-proof C++20 atomic wait/notify
This will allow us to extend atomic waiting functions to support a
possible future 64-bit version of futex, as well as supporting
futex-like wait/wake primitives on other targets (e.g. macOS has
os_sync_wait_on_address and FreeBSD has _umtx_op).
Before this change, the decision of whether to do a proxy wait or to
wait on the atomic variable itself was made in the header at
compile-time, which makes it an ABI property that would not have been
possible to change later. That would have meant that
std::atomic<uint64_t> would always have to do a proxy wait even if Linux
gains support for 64-bit futex2(2) calls at some point in the future.
The disadvantage of proxy waits is that several distinct atomic objects
can share the same proxy state, leading to contention between threads
even when they are not waiting on the same atomic object, similar to
false sharing. It also result in spurious wake-ups because doing a
notify on an atomic object that uses a proxy wait will wake all waiters
sharing the proxy.
For types that are known to definitely not need a proxy wait (e.g. int
on Linux) the header can still choose a more efficient path at
compile-time. But for other types, the decision of whether to do a proxy
wait is deferred to runtime, inside the library internals. This will
make it possible for future versions of libstdc++.so to extend the set
of types which don't need to use proxy waits, without ABI changes.
The way the change works is to stop using the __proxy_wait flag that was
set by the inline code in the headers. Instead the __wait_args struct
has an extra pointer member which the library internals populate with
either the address of the atomic object or the _M_ver counter in the
proxy state. There is also a new _M_obj_size member which stores the
size of the atomic object, so that the library can decide whether a
proxy is needed. So for example if linux gains 64-bit futex support then
the library can decide not to use a proxy when _M_obj_size == 8.
Finally, the _M_old member of the __wait_args struct is changed to
uint64_t so that it has room to store 64-bit values, not just whatever
size the __platform_wait_t type is (which is a 32-bit int on Linux).
Similarly, the _M_val member of __wait_result_type changes to uint64_t
too.
libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver: Adjust exports.
* include/bits/atomic_timed_wait.h (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT):
Do not define this macro.
(__atomic_wait_address_until_v, __atomic_wait_address_for_v):
Adjust assertions to check that __platform_wait_uses_type is
true.
* include/bits/atomic_wait.h (__waitable): New concept.
(__platform_wait_uses_type): Different separately for platforms
with and without platform wait.
(_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro.
(__wait_value_type): New typedef.
(__wait_result_type): Change _M_val to __wait_value_type.
(__wait_flags): Remove __proxy_wait enumerator. Reduce range
reserved for ABI version by the commented-out value.
(__wait_args_base::_M_old): Change type to __wait_args_base.
(__wait_args_base::_M_obj, __wait_args_base::_M_obj_size): New
data members.
(__wait_args::__wait_args): Set _M_obj and _M_obj_size on
construction.
(__wait_args::_M_setup_wait): Change void* parameter to deduced
type. Adjust bit_cast to work for types of different sizes.
(__wait_args::_M_load_proxy_wait_val): Remove function, replace
with ...
(__wait_args::_M_setup_proxy_wait): New function.
(__wait_args::_S_flags_for): Do not set __proxy_wait flag.
(__atomic_wait_address_v): Adjust assertion to check that
__platform_wait_uses_type is true.
* src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here
instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead
of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT.
(__platform_wait, __platform_notify, __platform_wait_until): Add
unused parameter for _M_obj_size.
(__spin_impl): Adjust for 64-bit __wait_args_base::_M_old.
(use_proxy_wait): New function.
(__wait_args::_M_load_proxy_wait_val): Replace with ...
(__wait_args::_M_setup_proxy_wait): New function. Call
use_proxy_wait to decide at runtime whether to wait on the
pointer directly instead of using a proxy. If a proxy is needed,
set _M_obj and _M_obj_size to refer to its _M_ver member. Adjust
for change to type of _M_old.
(__wait_impl): Wait on _M_obj unconditionally. Pass _M_obj_size
to __platform_wait.
(__notify_impl): Call use_proxy_wait to decide whether to notify
on the address parameter or a proxy
(__spin_until_impl): Adjust for change to type of _M_val.
(__wait_until_impl): Wait on _M_obj unconditionally. Pass
_M_obj_size to __platform_wait_until.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Wed, 26 Nov 2025 14:44:03 +0000 (14:44 +0000)]
libstdc++: Fix std::counting_semaphore<> default max value
My recent (uncommitted) changes to support a 64-bit __platform_wait_t
for FreeBSD and Darwin revealed a problem in std::counting_semaphore.
When the default template argument is used and __platform_wait_t is a
64-bit type, the numeric_limits<__platform_wait_t>::max() value doesn't
fit in ptrdiff_t and so we get ptrdiff_t(-1), which fails a
static_assert in the class body.
The solution is to cap the value to PTRDIFF_MAX instead of allowing it
to go negative.
libstdc++-v3/ChangeLog:
* include/bits/semaphore_base.h (__platform_semaphore::_S_max):
Limit to PTRDIFF_MAX to avoid negative values.
* testsuite/30_threads/semaphore/least_max_value.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
liuhongt [Tue, 25 Nov 2025 05:33:46 +0000 (21:33 -0800)]
Refactor mgather/mscatter implementation.
Current implementation is an alias to -mtune-crtl=(Alias(mtune-ctrl=,
use_gather, ^use_gather)), and maybe override by another -mtune-crtl=
.i.e -mgather -mscatter will only enable mscatter
The patch fixes the issue.
gcc/ChangeLog:
* config/i386/i386-options.cc (set_ix86_tune_features): Set
gather/scatter tune if OPTION_SET_P.
* config/i386/i386.opt: Refactor mgather/mscatter.
Lulu Cheng [Mon, 24 Nov 2025 09:03:49 +0000 (17:03 +0800)]
LoongArch: fmv: Fix compilation errors when using glibc versions earlier than 2.38.
The macros HWCAP_LOONGARCH_LSX and HWCAP_LOONGARCH_LASX were defined
in glibc 2.38. However, r16-5155 uses these two macros directly
without checking whether they are defined. This causes errors when
compiling libgcc with glibc versions earlier than 2.38.
gcc/ChangeLog:
* doc/extend.texi: Remove the incorrect prompt message.
libgcc/ChangeLog:
* config/loongarch/cpuinfo.c (HWCAP_LOONGARCH_LSX): Define
it if it is not defined.
(HWCAP_LOONGARCH_LASX): Likewise.
Sandra Loosemore [Thu, 27 Nov 2025 01:25:33 +0000 (01:25 +0000)]
doc: Add --compile-std-module to option summary
Commit 3ad2e2d707c3d6b0c6bd8c3ef0df4f7aaee1c3c added documentation
for this new C++ option, but missed also adding it to the corresponding
Option Summary list.
* c-parser.cc (c_parser_maxof_or_minof_expression): New func.
(c_parser_unary_expression): Add RID_MAXOF & RID_MINOF cases.
* c-tree.h (c_expr_maxof_type): New prototype.
(c_expr_minof_type): New prototype.
* c-typeck.cc (c_expr_maxof_type): New function.
(c_expr_minof_type): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/maxof-bitint.c: New test.
* gcc.dg/maxof-bitint575.c: New test.
* gcc.dg/maxof-compile.c: New test.
* gcc.dg/maxof-pedantic-errors.c: New test.
* gcc.dg/maxof-pedantic.c: New test.
Tamar Christina [Wed, 26 Nov 2025 22:00:07 +0000 (22:00 +0000)]
middle-end: guard against non-single use compares in emit_cmp_and_jump_insns
When I wrote this optimization my patch stack included a change in
tree-out-of-ssa that would duplicate the compares such that the
use is always single use and get_gimple_for_ssa_name would always
succeed.
However I have dropped that for GCC 16 since I didn't expect the
vectorizer to be able to produce duplicate uses of the same
compare results.
But I neglected that you can get it by other means. So this simply
checks that get_gimple_for_ssa_name succeeds for the LEN cases.
The non-LEN cases already check it earlier on.
To still get the optimization in this case the tree-out-of-ssa
change is needed, which is staged for next stage-1.
gcc/ChangeLog:
* optabs.cc (emit_cmp_and_jump_insns): Check for non-single use.
Jeff Law [Wed, 26 Nov 2025 21:52:11 +0000 (14:52 -0700)]
[RISC-V][PR rtl-optimization/122735] Avoid bogus calls to simplify_subreg
Recent changes to simplify_binary_operation_1 reassociate a SUBREG expression
in useful ways. But they fail to account for the asserts at the beginning of
simplify_subreg.
In particular simplify_subreg asserts that the mode can not be VOID or BLK --
the former being the problem here as it's used on CONST_INT nodes which may
appear in an unsimplified REG_EQUAL note like:
That triggers the new code in simplify-rtx to push the subreg into an inner
object. In particular it'll try to push the subreg to the first operand of the
LSHIFTRT. We pass that to simplify_subreg via simplify_gen_subreg and boom!
You could legitimately ask why the original note wasn't simplified further or
removed. That approach could certainly be used to fix this specific problem.
But we've never had that kind of requirement on REG_EQUAL notes and I think it
opens up a huge can of worms if we impose it now. So I chose to make the
newer simplify-rtx code more robust.
Bootstrapped and regression tested on x86_64 and riscv and tested on the
various embedded targets without regressions. I'll wait for the pre-commit CI
tester before committing.
PR rtl-optimization/122735
gcc/
* simplify-rtx.cc (simplify_binary_operation_1): When moving a SUBREG
from an outer expression to an inner operand, make sure to avoid
trying to create invalid SUBREGs.
Declare target's 'link' clause disallows 'nohost'; check for it.
Additionally, some other cleanups have been done.
The 'local' clause to 'declare target' is now supported in the FE,
but a 'sorry, unimplemented' is printed at TREE generation time.
This commit also adds the 'groupprivate' directive, which implies
'declare target' with the 'local' clause. And for completeness also
the 'dyn_groupprivate' clause to 'target'. However, all those new
features will eventually print 'sorry, unimplemented' for now.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_attr): Handle OpenMP's 'local' clause
and the 'groupprivate' directive.
(show_omp_clauses): Handle dyn_groupprivate.
* frontend-passes.cc (gfc_code_walker): Walk dyn_groupprivate.
* gfortran.h (enum gfc_statement): Add ST_OMP_GROUPPRIVATE.
(enum gfc_omp_fallback, gfc_add_omp_groupprivate,
gfc_add_omp_declare_target_local): New.
* match.h (gfc_match_omp_groupprivate): New.
* module.cc (enum ab_attribute, mio_symbol_attribute, load_commons,
write_common_0): Handle 'groupprivate' + declare target's 'local'.
* openmp.cc (gfc_omp_directives): Add 'groupprivate'.
(gfc_free_omp_clauses): Free dyn_groupprivate.
(enum omp_mask2): Add OMP_CLAUSE_LOCAL and OMP_CLAUSE_DYN_GROUPPRIVATE.
(gfc_match_omp_clauses): Match them.
(OMP_TARGET_CLAUSES): Add OMP_CLAUSE_DYN_GROUPPRIVATE.
(OMP_DECLARE_TARGET_CLAUSES): Add OMP_CLAUSE_LOCAL.
(gfc_match_omp_declare_target): Handle groupprivate + fixes.
(gfc_match_omp_threadprivate): Code move to and calling now ...
(gfc_match_omp_thread_group_private): ... this new function.
Also handle groupprivate.
(gfc_match_omp_groupprivate): New.
(resolve_omp_clauses): Resolve dyn_groupprivate.
* parse.cc (decode_omp_directive): Match groupprivate.
(case_omp_decl, parse_spec, gfc_ascii_statement): Handle it.
* resolve.cc (resolve_symbol): Handle groupprivate.
* symbol.cc (gfc_check_conflict, gfc_copy_attr): Handle 'local'
and 'groupprivate'.
(gfc_add_omp_groupprivate, gfc_add_omp_declare_target_local): New.
* trans-common.cc (build_common_decl,
accumulate_equivalence_attributes): Print 'sorry' for
groupprivate and declare target's local.
* trans-decl.cc (add_attributes_to_decl): Likewise..
* trans-openmp.cc (gfc_trans_omp_clauses): Print 'sorry' for
dyn_groupprivate.
(fallback): Process declare target with link/local as
done for 'enter'.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/crayptr2.f90: Move dg-error line.
* gfortran.dg/gomp/declare-target-2.f90: Extend.
* gfortran.dg/gomp/declare-target-4.f90: Update comment,
enable one test.
* gfortran.dg/gomp/declare-target-5.f90: Update dg- wording,
add new test.
* gfortran.dg/gomp/declare-target-indirect-2.f90: Expect
'device_type(any)' in scan-tree-dump.
* gfortran.dg/gomp/declare-target-6.f90: New test.
* gfortran.dg/gomp/dyn_groupprivate-1.f90: New test.
* gfortran.dg/gomp/dyn_groupprivate-2.f90: New test.
* gfortran.dg/gomp/groupprivate-1.f90: New test.
* gfortran.dg/gomp/groupprivate-2.f90: New test.
* gfortran.dg/gomp/groupprivate-3.f90: New test.
* gfortran.dg/gomp/groupprivate-4.f90: New test.
* gfortran.dg/gomp/groupprivate-5.f90: New test.
* gfortran.dg/gomp/groupprivate-6.f90: New test.
Marek Polacek [Mon, 24 Nov 2025 22:31:22 +0000 (17:31 -0500)]
c++: fix crash with pack indexing in noexcept [PR121325]
In my r15-6792 patch I added a call to tsubst in tsubst_pack_index
to fully instantiate args#N in the pack.
Here we are in an unevaluated context, but since the pack is
a TREE_VEC, we call tsubst_template_args which has cp_evaluated
at the beginning. That causes a crash because we trip on the
assert in tsubst_expr/PARM_DECL:
gcc_assert (cp_unevaluated_operand);
because retrieve_local_specialization didn't find anything (becase
there are no local_specializations yet).
We can avoid the cp_evaluated by calling the new tsubst_tree_vec,
which creates a new TREE_VEC and substitutes each element.
PR c++/121325
gcc/cp/ChangeLog:
* pt.cc (tsubst_tree_vec): New.
(tsubst_pack_index): Call it.
The CBN?Z instructions have a very small range (just 128 bytes
forwards). The compiler knows how to handle cases where we
exceed that, but only if the range remains within that which
a condition branch can support. When compiling some machine
generated code it is not too difficult to exceed this limit,
so arrange to fall back to a conditional branch over an
unconditional one in this extreme case.
gcc/ChangeLog:
PR target/122867
* config/arm/arm.cc (arm_print_operand): Use %- to
emit LOCAL_LABEL_PREFIX.
(arm_print_operand_punct_valid_p): Allow %- for punct
and make %_ valid for all compilation variants.
* config/arm/thumb2.md (*thumb2_cbz): Handle very
large branch ranges that exceed the limit of b<cond>.
(*thumb2_cbnz): Likewise.
gcc/testsuite/ChangeLog:
PR target/122867
* gcc.target/arm/cbz-range.c: New test.
The following avoids re-calling of vect_need_peeling_or_partial_vectors_p
after peeling. This was neccesary because the function does not
properly handle being called for epilogues since it looks for the
applied prologue peeling not in the main vector loop but the current
one operated on.
PR tree-optimization/110571
* tree-vectorizer.h (vect_need_peeling_or_partial_vectors_p): Remove.
* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
Fix when called on epilog loops. Make static.
* tree-vect-loop-manip.cc (vect_do_peeling): Do not
re-compute LOOP_VINFO_PEELING_FOR_NITER.
In emit_cmp_and_jump_insns I tried to detect if the operation is signed or
unsigned in order to convert the condition code into an unsigned code.
However I did this based on the incoming tree compare, which is done on the
boolean result. Since booleans are always signed in tree the result was that
we never used an unsigned compare when needed.
This checks one of the arguments of the compare instead.
Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.
Ok for master?
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
PR tree-optimization/122861
* optabs.cc (emit_cmp_and_jump_insns): Check argument instead of result.
gcc/testsuite/ChangeLog:
PR tree-optimization/122861
* gcc.target/aarch64/sve/vect-early-break-cbranch_10.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_11.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_12.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_13.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_14.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_15.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_9.c: New test.
* gcc.target/aarch64/vect-early-break-cbranch_4.c: New test.
* gcc.target/aarch64/vect-early-break-cbranch_5.c: New test.
Jakub Jelinek [Wed, 26 Nov 2025 14:01:11 +0000 (15:01 +0100)]
Change the default C++ dialect to gnu++20
On Mon, Nov 03, 2025 at 01:34:28PM -0500, Marek Polacek via Gcc wrote:
> I would like us to declare that C++20 is no longer experimental and
> change the default dialect to gnu++20. Last time we changed the default
> was over 5 years ago in GCC 11:
> <https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=0801f419440c14f6772b28f763ad7d40f7f7a580>
> and before that in 2015 in GCC 6.1, so this happens roughly every 5 years.
>
> I had been hoping to move to C++20 in GCC 15 (see bug 113920), but at that time
> libstdc++ still had incomplete C++20 support and the compiler had issues to iron
> out (mangling of concepts, modules work, etc.). Are we ready now? Is anyone
> aware of any blockers? Presumably we still wouldn't enable Modules by default.
>
> I'm willing to do the work if we decide that it's time to switch the default
> C++ dialect (that includes updating cxx-status.html and adding a new caveat to
> changes.html).
I haven't seen a patch posted for this, so just that something is posted
during stage1 if we decide to do it, here is a patch.
The patch makes -std=gnu++20 the default C++ dialect and documents that
-fmodules is still not implied by that or -std=c++20 and modules support
is still experimental.
2025-11-26 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/invoke.texi (gnu++17): Remove comment about the default.
(c++20): Remove note about experimental support, except add a note
that modules are still experimental and need to be enabled separately.
(gnu++20): Likewise. Move here comment about the default.
(fcoroutines): Mention it is enabled by default for C++20 and later.
* doc/standards.texi: Document that the default for C++ is
-std=gnu++20.
gcc/c-family/
* c-opts.cc (c_common_init_options): Call set_std_cxx20 rather than
set_std_cxx17.
* c.opt (std=c++2a): Change description to deprecated option wording.
(std=c++20): Remove experimental support part.
(std=c++2b): Change description to deprecated option wording.
(std=gnu++2a): Likewise.
(std=gnu++20): Remove experimental support part.
(std=gnu++2b): Change description to deprecated option wording.
gcc/testsuite/
* lib/target-supports.exp: Set cxx_default to c++20 rather than
c++17.
* lib/g++-dg.exp (g++-std-flags): Reorder list to put 20 first
and 17 after 26.
* g++.dg/debug/pr80461.C (bar): Use v = v + 1; instead of ++v;.
* g++.dg/debug/pr94459.C: Add -std=gnu++17 to dg-options.
* g++.dg/diagnostic/virtual-constexpr.C: Remove dg-skip-if,
instead use { c++11 && c++17_down } effective target instead of
c++11.
* g++.dg/guality/pr67192.C: Add -std=gnu++17.
* g++.dg/torture/pr84961-1.C: Likewise.
* g++.dg/torture/pr84961-2.C: Likewise.
* g++.dg/torture/pr51482.C (anim_track_bez_wvect::tangent): Cast
key_class to int before multiplying it by float.
* g++.dg/torture/stackalign/unwind-4.C (foo): Use g_a = g_a + 1;
instead of g_a++;.
* g++.dg/tree-prof/partition1.C (bar): Use l = l + 1; return l;
instead of return ++l;.
* obj-c++.dg/exceptions-3.mm: Add -std=gnu++17.
* obj-c++.dg/exceptions-5.mm: Likewise.
libgomp/
* testsuite/libgomp.c++/atomic-12.C (main): Add ()s around array
reference index.
* testsuite/libgomp.c++/atomic-13.C: Likewise.
* testsuite/libgomp.c++/atomic-8.C: Likewise.
* testsuite/libgomp.c++/atomic-9.C: Likewise.
* testsuite/libgomp.c++/loop-6.C: Use count = count + 1;
return count > 0; instead of return ++count > 0;.
* testsuite/libgomp.c++/pr38650.C: Add -std=gnu++17.
* testsuite/libgomp.c++/target-lambda-1.C (merge_data_func):
Use [=,this] instead of just [=] in lambda captures.
* testsuite/libgomp.c-c++-common/target-40.c (f1): Use v += 1;
instead of v++;.
* testsuite/libgomp.c-c++-common/depend-iterator-2.c: Use v = v + 1;
instead of v++.
Tomasz Kamiński [Thu, 13 Nov 2025 13:54:11 +0000 (14:54 +0100)]
libstdc++: Optimize functor storage for transform views iterators.
The iterators for transform views (views::transform, views::zip_transform,
and views::adjacent_transform) now store a function handle from (from
__detail::__func_handle namespace) instead of a pointer to the view object
(_M_parent).
The following handle templates are defined in __func_handle namespace:
* _Inplace: Used if the functor is a function pointer or standard operator
wrapper (std::less<>, etc). The functor is stored directly in __func_handle
and the iterator. This avoid double indirection through a pointer to the
function pointer, and reduce the size of iterator for std wrappers.
* _InplaceMemPtr: Used for data or function member pointers. This behaves
similarly to _Inplace, but uses __invoke for invocations.
* _StaticCall: Used if the operator() selected by overload resolution
for the iterator reference is static. In this case, __func_handle is empty,
reducing the iterator size.
* _ViaPointer: Used for all remaining cases. __func_handle stores a pointer
to the functor object stored within the view. Only for this template the
cv-qualification of the functor template parameter (_Fn) relevant, and
specialization for both const and mutable types are generated.
As a consequence of these changes, the iterators of transform views no longer
depend on the view object when handle other than __func_handle::_ViaPointer
is used. The corresponding views are not marked as borrowed_range, as they
are not marked as such in the standard.
The use of _Inplace is limited to only set of pre-C++20 standard functors,
as for once introduced later operator() was retroactively made static.
We do not extent to to any empty fuctor, as it's oprator may still depend on
value of this pointer as illustrated by test12 in
std/ranges/adaptors/transform.cc test file.
Storing function member pointers directly increases the iterator size in that
specific case, but this is deemed beneficial for consistent treatment of
function and data member pointers.
To avoid materializing temporaries when the underlying iterator(s) return a
prvalue, the _M_call_deref and _M_call_subscript methods of handles are
defined to accept the iterator(s), which are then dereferenced as arguments
of the functor.
Using _Fd::operator()(*__iters...) inside requires expression is only
supported since clang-20, however at the point of GCC-16 release, clang-22
should be already available.
libstdc++-v3/ChangeLog:
* include/std/ranges (__detail::__is_std_op_template)
(__detail::__is_std_op_wrapper, __func_handle::_Inplace)
(__func_handle::_InplaceMemPtr, __func_handle::_ViaPointer)
(__func_handle::_StaticCall, __detail::__func_handle_t): Define.
(transform_view::_Iterator, zip_transform_view::_Iterator)
(adjacent_tranform_view::_Iterator): Replace pointer to view
(_M_parent) with pointer to functor (_M_fun). Update constructors
to construct _M_fun from *__parent->_M_fun. Define operator* and
operator[] in terms of _M_call_deref and _M_call_subscript.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc: New tests.
* testsuite/std/ranges/adaptors/transform.cc: New tests.
* testsuite/std/ranges/zip_transform/1.cc: New tests.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Thu, 20 Nov 2025 10:23:30 +0000 (11:23 +0100)]
libstdc++: Make C++20s operator wrappers operator() static.
The operator() for function objects introduced in C++20 (e.g., std::identity,
std::compare_three_way, std::ranges::equal) is now defined as static.
Although static operator() is a C++23 feature, it is supported in C++20 by
both GCC and clang (since their support was added in clang-16).
This change is not user-observable, as all affected operators are template
functions. Taking the address of such an operator requires casting to a pointer
to member function with a specific signature. The exact signature is unspecified
per C++20 [member.functions] p2 (e.g. due to potential parameters with default
arguments).
libstdc++-v3/ChangeLog:
* include/bits/ranges_cmp.h (std::identity::operator()):
(ranges::equal_to:operator(), ranges::not_equal_to:operator())
(ranges::greater::operator(), ranges::greater_equal::operator())
(ranges::less::operator(), ranges::less_equal::operator()):
Declare as static.
* libsupc++/compare (std::compare_three_way::operator()):
Declare as static.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Wed, 26 Nov 2025 10:05:42 +0000 (11:05 +0100)]
eh: Invoke cleanups/destructors in asm goto jumps [PR122835]
The eh pass lowers try { } finally { } stmts and handles
in there e.g. GIMPLE_GOTOs or GIMPLE_CONDs which jump from
within the try block out of that by redirecting the jumps
to an artificial label with code to perform the cleanups/destructors
and then continuing the goto, ultimately to the original label.
Now, for computed gotos and non-local gotos, we document we don't
invoke destructors (and cleanups as well), that is something we really
can't handle, similarly longjmp.
This PR is about asm goto though, and in that case I don't see why
we shouldn't be performing the cleanups, while the user doesn't
specify which particular label will be jumped to, so it is more
like GIMPLE_COND (i.e. conditional goto) rather than unconditional
GIMPLE_GOTO, even with potentiall more different maybe gotos, there is
still list of the potential labels and we can adjust some or all of them
to artificial labels performing cleanups and continuing jump towards the
user label, we know from where the jumps go (asm goto) and to where
(the different LABEL_DECLs).
So, the following patch handles asm goto in the eh pass similarly to
GIMPLE_COND and GIMPLE_GOTO.
Jakub Jelinek [Wed, 26 Nov 2025 09:57:37 +0000 (10:57 +0100)]
match.pd: Use get_range_query (cfun) in more simplifications and pass current stmt to range_of_expr [PR119683]
The following testcase regressed with r13-3596 which switched over
vrp1 to ranger vrp. Before that, I believe vrp1 was registering
SSA_NAMEs with ASSERT_EXPRs at the start of bbs and so even when
querying the global ranges from match.pd patterns during the vrp1
pass, they saw the local ranges for a particular bb rather than global
ranges. In ranger vrp that doesn't happen anymore, so we need to
pass a stmt to range_of_expr if we want the local ranges rather
than global ones, plus should be using get_range_query (cfun)
instead of get_global_range_query () (most patterns actually use
the former already). Now, for stmt the following patch attempts
to pass the innermost stmt on which that particular capture appears
as operand, but because some passes use match.pd folding on expressions
not yet in the IL, I've added a helper function which tries to find out
from a capture of the LHS operation whether it is a SSA_NAME with
SSA_NAME_DEF_STMT which is in the IL right now and only query
the ranger with that in that case, otherwise NULL (i.e. what it has
been using before).
2025-11-26 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119683
* gimple-match.h (gimple_match_ctx): New inline function.
* match.pd ((mult (plus:s (mult:s @0 @1) @2) @3)): Capture
PLUS, use get_range_query (cfun) instead of
get_global_range_query () and pass gimple_match_ctx (@5)
as 3rd argument to range_of_expr.
((plus (mult:s (plus:s @0 @1) @2) @3)): Similarly for MULT,
with @4 instead of @5.
((t * u) / u -> t): Similarly with @2 instead of @4.
((t * u) / v -> t * (u / v)): Capture MULT, pass gimple_match_ctx (@3)
as 3rd argument to range_of_expr.
((X + M*N) / N -> X / N + M): Pass gimple_match_ctx (@3) or
gimple_match_ctx (@4) as 3rd arg to some range_of_expr calls.
((X - M*N) / N -> X / N - M): Likewise.
((X + C) / N -> X / N + C / N): Similarly.
(((T)(A)) + CST -> (T)(A + CST)): Capture CONVERT, use
get_range_query (cfun) instead of get_global_range_query ()
and pass gimple_match_ctx (@2) as 3rd argument to range_of_expr.
(x_5 == cstN ? cst4 : cst3): Capture EQNE and pass
gimple_match_ctx (@4) as 3rd argument to range_of_expr.
Soumya AR [Wed, 16 Jul 2025 13:32:08 +0000 (06:32 -0700)]
aarch64: Script to auto generate JSON tuning routines
This commit introduces a Python maintenance script that generates C++ code
for parsing and serializing AArch64 JSON tuning parameters based on the
schema defined in aarch64-json-schema.h.
The script generates two include files:
- aarch64-json-tunings-parser-generated.inc
- aarch64-json-tunings-printer-generated.inc
These generated files are committed as regular source files and included by
aarch64-json-tunings-parser.cc and aarch64-json-tunings-printer.cc respectively.
----
Additionally, this commit centralizes tuning enum definitions into a new
aarch64-tuning-enums.def file. The enums (autoprefetch_model and ldp_stp_policy)
are now defined once using macros and consumed by both the core definitions
(aarch64-opts.h, aarch64-protos.h) and the generated parser/printer code.
Doing this ensures that if someone wants to add a new enum value, they only
need to make modifications in the .def file, and the codegen from the script
will automatically refer to the same enums.
----
The script is run automatically whenever the JSON schema is modified.
----
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-json-tunings-parser.cc: Include
aarch64-json-tunings-parser-generated.inc.
* config/aarch64/aarch64-json-tunings-printer.cc: Include
aarch64-json-tunings-printer-generated.inc.
* config/aarch64/aarch64-opts.h (AARCH64_LDP_STP_POLICY): Use
aarch64-tuning-enums.def.
* config/aarch64/aarch64-protos.h (AARCH64_AUTOPREFETCH_MODE): Use
aarch64-tuning-enums.def.
* config/aarch64/t-aarch64: Invoke
aarch64-generate-json-tuning-routines.py if the schema is modified.
* config/aarch64/aarch64-generate-json-tuning-routines.py: New
maintenance script to generate JSON parser/printer routines.
* config/aarch64/aarch64-json-tunings-parser-generated.inc: New file.
* config/aarch64/aarch64-json-tunings-printer-generated.inc: New file.
* config/aarch64/aarch64-tuning-enums.def: New file.
Soumya AR [Wed, 16 Jul 2025 13:31:33 +0000 (06:31 -0700)]
aarch64: Regression tests for parsing of user-provided AArch64 CPU tuning parameters
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/aarch64-json-tunings/aarch64-json-tunings.exp: New test.
* gcc.target/aarch64/aarch64-json-tunings/boolean-1.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/boolean-1.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/boolean-2.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/boolean-2.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/empty-brackets.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/empty-brackets.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/empty.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/empty.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/enum-1.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/enum-1.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/enum-2.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/enum-2.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/integer-1.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/integer-1.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/integer-2.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/integer-2.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/integer-3.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/integer-3.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/string-1.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/string-1.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/string-2.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/string-2.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/test-all.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/test-all.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/unidentified-key.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/unidentified-key.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/unsigned-1.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/unsigned-1.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/unsigned-2.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/unsigned-2.json: New test.
* gcc.target/aarch64/aarch64-json-tunings/unsigned-3.c: New test.
* gcc.target/aarch64/aarch64-json-tunings/unsigned-3.json: New test.
Soumya AR [Wed, 16 Jul 2025 13:29:57 +0000 (06:29 -0700)]
aarch64: Enable parsing of user-provided AArch64 CPU tuning parameters
This patch adds support for loading custom CPU tuning parameters from a JSON
file for AArch64 targets. The '-muser-provided-CPU=' flag accepts a user
provided JSON file and overrides the internal tuning parameters at GCC runtime.
This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config.gcc: Add aarch64-json-tunings-parser.o.
* config/aarch64/aarch64.cc (aarch64_override_options_internal): Invoke
aarch64_load_tuning_params_from_json if -muser-provided-CPU= is
(aarch64_json_tunings_tests): Extern aarch64_json_tunings_tests().
(aarch64_run_selftests): Add aarch64_json_tunings_tests().
* config/aarch64/aarch64.opt: New option.
* config/aarch64/t-aarch64 (aarch64-json-tunings-parser.o): New define.
* config/aarch64/aarch64-json-schema.h: New file.
* config/aarch64/aarch64-json-tunings-parser.cc: New file.
* config/aarch64/aarch64-json-tunings-parser.h: New file.
Soumya AR [Fri, 11 Jul 2025 12:54:33 +0000 (05:54 -0700)]
json: Add get_map() method to JSON object class
This patch adds a get_map () method to the JSON object class to provide access
to the underlying hash map that stores the JSON key-value pairs.
To do this, we expose the map_t typedef, the return type of get_map().
This change is needed to allow traversal of key-value pairs when parsing
user-provided JSON tuning data.
Additionally, is_a_helper template specializations for json::literal * and
const json::literal * were added to make dynamic casting in the next patch
easier.
This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
Soumya AR [Fri, 11 Jul 2025 12:28:17 +0000 (05:28 -0700)]
aarch64: Enable dumping of AArch64 CPU tuning parameters to JSON
This patch adds functionality to dump AArch64 CPU tuning parameters to a JSON
file. The new '-fdump-tuning-model=' flag allows users to export the current
tuning model configuration to a JSON file.
This patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
* config.gcc: Add aarch64-json-tunings-printer.o.
* config/aarch64/aarch64.cc (aarch64_override_options_internal): Invoke
aarch64_print_tune_params if -fdump-tuning-model= is specified.
* config/aarch64/aarch64.opt: New option.
* config/aarch64/t-aarch64 (aarch64-json-tunings-printer.o): New define.
* config/aarch64/aarch64-json-tunings-printer.cc: New file.
* config/aarch64/aarch64-json-tunings-printer.h: New file.
Soumya AR [Thu, 10 Jul 2025 10:52:00 +0000 (03:52 -0700)]
aarch64 + arm: Remove const keyword from tune_params members and nested members
To allow runtime updates to tuning parameters, the const keyword is removed from
the members of the tune_params structure and the members of its nested
structures.
Since this patch also touches tuning structures in the arm backend, it was
bootstrapped on aarch64-linux-gnu as well as arm-linux-gnueabihf.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
Tomasz Kamiński [Wed, 19 Nov 2025 09:29:18 +0000 (10:29 +0100)]
libstdc++: Hashing support for chrono value classes [PR110357]
This patch implements P2592R3 Hashing support for std::chrono value classes.
To avoid the know issues with current hashing of integer types (see PR104945),
we use chrono::__int_hash function that hash the bytes of representation,
instead of hash<T>, as the later simply cast to value. Currently _Hash_impl
it used, but we should consider replacing it (see PR55815) before C++26 ABI
is made stable. The function is declared inside <chrono> header and chrono
namespace, to make sure that only chrono components would be affected by
such change. Finally, chrono::__int_hash is made variadic, to support
combining hashes of multiple integers.
To reduce the number of calls to hasher (defined out of line), the calendar
types are packed into single unsigned integer value. This is done by
chrono::__hash helper, that calls:
* chrono::__as_int to cast the value of single component, to unsigned integer
with size matching the one used by internal representation: unsigned short
for year/weekday_indexed, and unsigned char in all other cases,
* chrono::__pack_ints to pack integers (if more than one) into single integer
by performing bit shift operations,
* chrono::__int_hash to hash the value produced by above.
Hashing of duration, time_point, and zoned_time only hashes the value and
ignores any difference in the period, i.e. hashes of nanoseconds(2) and
seconds(2) are the same. This does not affect the usages inside unordered
containers, as the arguments are converted to key type first. To address
that period::num and period::den could be included in the hash, however
such approach will not make hashes of equal durations (2000ms, 2s) equal,
so they would remain unusable for precomputed hashes. In consequence,
including period in hash, would only increase runtime cost, withou any
clear benefits.
Futhermore, chrono::__int_hash is used when the duration representation
is integral type, and for other types (floating point due special handling
of +/-0.0 and user defined types) we delegate to hash specialization.
This is automatically picked up by time_point, that delegates to hasher
of duration. Similarly for leap_second that is specified to use integer
durations, we simply hash representations of date() and value(). Finally
zoned_time in addition to handling integer durations as described above,
we also use __int_hash for const time_zone* (if used), as hash<T*> have
similar problems as hash specialization for integers. This is limited
only to _TimeZonePtr being const time_zone* (default), as user can define
hash specializations for raw pointers to they zones.
As accessing the representation for duration requires calling count()
method that returns a copy of representation by value, the noexcept
specification of the hasher needs to take into consideration copy
constructor of duration. Similar reasoning applies for time_since_epoch
for time_points, and get_sys_time, get_time_zone for zoned_time.
For all this cases we use internal __is_nothrow_copy_hashable concept.
Finally support for zoned_time is provided only for CXX11 string ABI,
__cpp_lib_chrono feature test macro cannot be bumped if COW string are used.
To indicate presence of hasher for remaining types this patch also bumps
the internal __glibcxx_chrono_cxx20 macro, and uses it as guard to new
features.
PR libstdc++/110357
libstdc++-v3/ChangeLog:
* include/bits/version.def (chrono, chrono_cxx20): Bump values.
* include/bits/version.h: Regenerate.
* include/std/chrono (__is_nothrow_copy_hashable)
(chrono::__pack_ints, chrono::__as_int, chrono::__int_hash)
(chrono::__hash): Define.
(std::hash): Define partial specialization for duration, time_point,
and zoned_time, and full specializations for calendar types and
leap_second.
(std::__is_fast_hash): Define partial specializations for duration,
time_point, zoned_time.
* testsuite/std/time/hash.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Co-authored-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Paul Thomas [Wed, 26 Nov 2025 06:59:20 +0000 (06:59 +0000)]
Fortran: Implement finalization PDTs [PR104650]
2025-11-26 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/104650
* decl.cc (gfc_get_pdt_instance): If the PDT template has
finalizers, make a new f2k_derived namespace for this intance
and copy the template namespace into it. Set the instance
template_sym field to point to the template.
* expr.cc (gfc_check_pointer_assign): Allow array value pointer
lvalues to point to scalar null expressions in initialization.
* gfortran.h : Add the template_sym field to gfc_symbol.
* resolve.cc (gfc_resolve_finalizers): For a pdt_type, copy the
final subroutines with the same type argument into the pdt_type
finalizer list. Prevent final subroutine type checking and
creation of the vtab for pdt_templates.
* symbol.cc (gfc_free_symbol): Do not call gfc_free_namespace
for pdt_type with finalizers. Instead, free the finalizers and
the namespace.
gcc/testsuite
PR fortran/104650
* gfortran.dg/pdt_70.f03: New test.
Make better use of overflowing operations in max/min(a, add/sub(a, b)) [PR116815]
This patch folds the following patterns:
- For add:
- umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
- umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
... along with the commutated versions:
- umax (a, add (b, a)) -> [sum, ovf] = adds (b, a); !ovf ? sum : a
- umin (a, add (b, a)) -> [sum, ovf] = adds (b, a); !ovf ? a : sum
- For sub:
- umax (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? diff : a
- umin (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? a : diff
Where ovf is the overflow flag and udf is the underflow flag. adds and subs are
generated by generating parallel compare+plus/minus which map to
add<mode>3_compareC and sub<mode>3_compare1.
This patch is a respin of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
the suggestion to turn it into a target-specific transform by Richard
Biener.
FIXME: This pattern cannot currently factor multiple occurences of the
add expression into a single adds, eg: max (a, a + b) + min (a + b, b)
ends up generating two adds instructions. This is something that
was lost when going from GIMPLE to target-specific transforms.
Bootstrapped and regtested on aarch64-unknown-linux-gnu.
* config/aarch64/aarch64.md
(*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): New pattern.
(*aarch64_minus_within_<optab><mode>3): Likewise.
* config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
(udf_sub_cmp): Likewise.
(UMAXMIN): New code iterator.
(ovf_commutate): New iterator.
(ovf_comm_opp): New int attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr116815-1.c: New test.
* gcc.target/aarch64/pr116815-2.c: Likewise.
* gcc.target/aarch64/pr116815-3.c: Likewise.
Pan Li [Mon, 20 Oct 2025 13:08:46 +0000 (21:08 +0800)]
RISC-V: Add testcase for unsigned scalar SAT_MUL form 7
The form 7 of unsigned scalar SAT_MUL has supported from the
previous change. Thus, add the test cases to make sure it
works well.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-8-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-8-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-8-u8-from-u64.c: New test.
Pan Li [Tue, 25 Nov 2025 07:18:38 +0000 (15:18 +0800)]
Match: Add unsigned SAT_MUL for form 7
This patch would like to try to match the the unsigned
SAT_MUL form 7, aka below:
#define DEF_SAT_U_MUL_FMT_7(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_7 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
NT max = -1; \
bool overflow_p = x > (WT)(max); \
return -(NT)(overflow_p) | (NT)x; \
}
while WT is uint128_t, uint64_t, uint32_t and uint16_t, and
NT is uint64_t, uint32_t, uint16_t or uint8_t.
gcc/ChangeLog:
* match.pd: Add pattern for SAT_MUL form 7 include
mul and widen_mul.
Andrew Pinski [Tue, 25 Nov 2025 07:34:45 +0000 (23:34 -0800)]
phiprop: Small compile time improvement for phiprop
Now that post dom information is only needed when the new store
can trap (since r16-5555-g952e145796d), only calculate it when
that is the case. It was calculated on demand by r14-2051-g3124bfb14c0bdc. This just changes when we need to
calculate it.
Pushed as obvious.
gcc/ChangeLog:
* tree-ssa-phiprop.cc (propagate_with_phi): Only
calculate on demand post dom info when the new store
might trap.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 25 Nov 2025 22:19:18 +0000 (14:19 -0800)]
phiprop: Make sure types of the load match the inserted phi [PR122847]
This was introduced with r16-5556-ge94e91d6f3775, but the type
check for the delay was not happen because the type at the point
of delaying was set to NULL. It is only until a non-delayed load
when the phi is created, the type is set.
This adds the type check to the replacement for the delayed statements.
Pushed as obvious.
PR tree-optimization/122847
gcc/ChangeLog:
* tree-ssa-phiprop.cc (propagate_with_phi): Add type
check for reuse of the phi for the delayed statements.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr122847-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
This is not ia64-specific but due to the changes in the recent glibc
commit "Implement C23 const-preserving standard library macros" (i.e.
[2]) now requiring "char *q" to be a pointer to a const char to compile
w/o error because of the return value of strchr() .
From 80af9c233c694904174b54a59404d311378f41f8 Mon Sep 17 00:00:00 2001
From: Frank Scheiner <frank.scheiner@web.de>
Date: Sat, 22 Nov 2025 14:58:10 +0100
Subject: [PATCH] libgomp: Fix GCC build after glibc@cd748a6
char *q needs to be a pointer to a const char for the return value of
strchr() with glibc after "Implement C23 const-preserving standard library
macros".
Philip Herron [Mon, 17 Nov 2025 21:14:44 +0000 (21:14 +0000)]
gccrs: Add support for initial generic associated types
This patch is the initial part in supporting generic associated types. In rust we have
trait item types that get implemented for example:
trait Foo<T> {
type Bar
}
impl<T> Foo for T {
type Bar = T
}
The trait position uses a Ty::Placeholder which is just a thing that gets set for
lazy evaluation to the impl type alias which is actually a Ty::Projection see:
For more info the projection type needs to hold onto generics in order to properly
support generic types this GAT's support extends this all the way to the placeholder
which still needs to be done.
Fixes Rust-GCC#4276
gcc/rust/ChangeLog:
* ast/rust-ast.cc (TraitItemType::as_string): add generic params
* ast/rust-ast.h: remove old comment
* ast/rust-item.h: add generic params to associated type
* ast/rust-type.h: remove old comment
* hir/rust-ast-lower-implitem.cc (ASTLowerTraitItem::visit): hir lowering for gat's
* hir/tree/rust-hir-item.cc (TraitItemType::TraitItemType): gat's on TraitItemType
(TraitItemType::operator=): preserve generic params
* hir/tree/rust-hir-item.h: likewise
* hir/tree/rust-hir.cc (TraitItemType::as_string): likewise
* parse/rust-parse-impl.h (Parser::parse_trait_type): hit the < and parse params
* typecheck/rust-hir-type-check-implitem.cc (TypeCheckImplItemWithTrait::visit): typecheck
* typecheck/rust-tyty.cc (BaseType::has_substitutions_defined): dont destructure
gcc/testsuite/ChangeLog:
* rust/compile/gat1.rs: New test.
* rust/execute/torture/gat1.rs: New test.
Signed-off-by: Philip Herron <herron.philip@googlemail.com>
Owen Avery [Sun, 17 Aug 2025 18:15:35 +0000 (14:15 -0400)]
gccrs: Improve feature handling
This includes a program, written using flex and bison, to extract
information on unstable features from rustc source code and save it to a
header file.
The script does fetch files from https://github.com/rust-lang/rust (the
official rustc git repository), which should be alright, as it's only
intended to be run by maintainers.
See https://doc.rust-lang.org/unstable-book/ for information on unstable
features.
gcc/rust/ChangeLog:
* checks/errors/feature/rust-feature-gate.cc
(FeatureGate::gate): Handle removal of Feature::create.
(FeatureGate::visit): Refer to AUTO_TRAITS as
OPTIN_BUILTIN_TRAITS.
* checks/errors/feature/rust-feature.cc (Feature::create):
Remove.
(Feature::feature_list): New static member variable.
(Feature::name_hash_map): Use "rust-feature-defs.h" to define.
(Feature::lookup): New member function definition.
* checks/errors/feature/rust-feature.h (Feature::State): Add
comments.
(Feature::Name): Use "rust-feature-defs.h" to define.
(Feature::as_string): Make const.
(Feature::name): Likewise.
(Feature::state): Likewise.
(Feature::issue): Likewise.
(Feature::description): Remove member function declaration.
(Feature::create): Remove static member function declaration.
(Feature::lookup): New member function declarations.
(Feature::Feature): Adjust arguments.
(Feature::m_rustc_since): Rename to...
(Feature::m_rust_since): ...here.
(Feature::m_description): Remove.
(Feature::m_reason): New member variable.
(Feature::feature_list): New static member variable.
* checks/errors/feature/rust-feature-defs.h: New file.
* checks/errors/feature/contrib/parse.y: New file.
* checks/errors/feature/contrib/scan.l: New file.
* checks/errors/feature/contrib/.gitignore: New file.
* checks/errors/feature/contrib/Makefile: New file.
* checks/errors/feature/contrib/fetch: New file.
* checks/errors/feature/contrib/regen: New file.
* checks/errors/feature/contrib/copyright-stub.h: New file.
* checks/errors/feature/contrib/README: New file.
Owen Avery [Sat, 28 Jun 2025 01:44:01 +0000 (21:44 -0400)]
gccrs: Create LocalVariable
This should make it easier for us to move away from leaking pointers to
Bvariable everywhere. Since LocalVariable has a single field of type
tree, it should be the same size as a pointer to Bvariable, making the
switch to LocalVariable wherever possible strictly an improvement.
Rainer Orth [Tue, 25 Nov 2025 21:25:48 +0000 (22:25 +0100)]
build: Save/restore CXXFLAGS for zstd tests
I recently encountered a bootstrap failure on trunk caused by the fact
that an older out-of-tree version of ansidecl.h was found before the
in-tree one in $top_srcdir/include, so some macros from that header
that are used in gcc weren't defined.
The out-of-tree version was located in $ZSTD_INC (-I/vol/gcc/include)
which caused that directory to be included in gcc's CXXFLAGS like
CXXFLAGS='-g -O2 -fchecking=1 -I/vol/gcc/include'
causing it to be searched before $srcdir/../include.
I could trace this to the zstd.h test in gcc/configure.ac which sets
CXXFLAGS and LDFLAGS before the actual test, but doesn't reset them
afterwards.
So this patch does just that.
Bootstrapped without regressions on i386-pc-solaris2.11 and
x86_64-pc-linux-gnu.
David Malcolm [Tue, 25 Nov 2025 17:44:10 +0000 (12:44 -0500)]
testsuite: fix issues in gcc.dg/analyzer/strchr-1.c seen with C23 libc
Simplify this test case in the hope of avoiding an error seen
with glibc-2.42.9000-537-gcd748a63ab1 in CI with
"Implement C23 const-preserving standard library macros".
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/strchr-1.c: Drop include of <string.h>, and use
__builtin_strchr throughout rather than strchr to avoid const
correctness issues when the header implements strchr with a C23
const-preserving macro. Drop "const" from two vars.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 25 Nov 2025 17:43:52 +0000 (12:43 -0500)]
analyzer: add logging to deref_before_check::emit
gcc/analyzer/ChangeLog:
* sm-malloc.cc (deref_before_check::emit): Add logging of the
various conditions for late-rejection of a
-Wanalyzer-deref-before-check warning.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Where an we use an Adv. SIMD compare and a reduction sequence to implement
early break. This patch implements the new optabs vec_cbranch_any and
vec_cbranch_all in order to replace the Adv. SIMD compare and reduction with
an SVE flag-setting compare.
This optab could also be used for optimizing the Adv. SIMD Sequence when SVE
is not available. I have a separate patch for that and will send depending on
if this approach is accepted or not.
Note that for floating-point we still need the ptest as floating point SVE
compares don't set flags. In addition because SVE doesn't have a CMTST
equivalent instruction we have to do an explicit AND before the compares.
These two cases don't have a speed advantage, but do have a codesize one
so I've left them enabled.
This patch also eliminated PTEST on normal SVE compare and branch through
the introduction of new optabs cond_vec_cbranch_any and cond_vec_cbranch_all.
In the example
void f1 ()
{
for (int i = 0; i < N; i++)
{
b[i] += a[i];
if (a[i] > 0)
break;
}
}
Where the ptest isn't needed since the branch only cares about the Z and NZ
flags.
GCC Today supports eliding this through the pattern *cmp<cmp_op><mode>_ptest
however this pattern only supports the removal when the outermost context is a
CMP where the predicate is inside the condition itself.
This typically only happens for an unpredicated CMP as a ptrue will be generated
during expand.
where the loop mask is applied to the compare as an AND.
The loop mask is moved into the compare by the pattern *cmp<cmp_op><mode>_and
which moves the mask inside if the current mask is a ptrue since
p && true -> p.
However this happens after combine, and so we can't both move the predicate
inside AND eliminate the ptests.
To fix this the middle-end will now rewrite the mask into the compare optab
and indicate that only the CC flags are required. This allows us to simply
not generate the ptest at all, rather than trying to eliminate it later on.
Tamar Christina [Tue, 25 Nov 2025 12:51:52 +0000 (12:51 +0000)]
vect: check support for gcond with {cond{_len}_}vec_cbranch_{any|all} optabs [PR118974]
This change allows a target to only implement the explicit vec_cbranch optabs.
To do this the vectorizer is updated to check for the new optabs directly.
Targets that have a different type for BOOLEAN_VECTOR_TYPE_P for instance
can use only the new optabs.
gcc/ChangeLog:
PR target/118974
* tree-vect-stmts.cc (supports_vector_compare_and_branch): New.
(vectorizable_early_exit): Use it.
Tamar Christina [Tue, 25 Nov 2025 12:51:31 +0000 (12:51 +0000)]
middle-end: support new {cond{_len}_}vec_cbranch_{any|all} optabs [PR118974]
This patch introduces six new vector cbranch optabs
1. vec_cbranch_any and vec_cbranch_all.
2. cond_vec_cbranch_any and cond_vec_cbranch_all.
3. cond_len_vec_cbranch_any and cond_len_vec_cbranch_all.
Today cbranch can be used for both vector and scalar modes. In both these
cases it's intended to compare boolean values, either scalar or vector.
The optab documentation does not however state that it can only handle
comparisons against 0. So many targets have added code for the vector variant
that tries to deal with the case where we branch based on two non-zero
registers.
However this code can't ever be reached because the cbranch expansion only deals
with comparisons against 0 for vectors. This is because for vectors the rest of
the compiler has no way to generate a non-zero comparison. e.g. the vectorizer
will always generate a zero comparison, and the C/C++ front-ends won't allow
vectors to be used in a cbranch as it expects a boolean value. ISAs like SVE
work around this by requiring you to use an SVE PTEST intrinsics which results
in a single scalar boolean value that represents the flag values.
e.g. if (svptest_any (..))
The natural question is why do we not at expand time then rewrite the comparison
to a non-zero comparison if the target supports it.
The reason is we can't safely do so. For an ANY comparison (e.g. != b) this is
trivial, but for an ALL comparison (e.g. == b) we would have to flip both branch
and invert the value being compared. i.e. we have to make it a != b comparison.
But in emit_cmp_and_jump_insns we can't flip the branches anymore because they
have already been lowered into a fall through branch (PC) and a label, ready for
use in an if_then_else RTL expression.
Now why does any of this matter? Well there are three optimizations we want to be
able to do.
1. Adv. SIMD does not support a vector !=, as in there's no instruction for it.
For both Integer and FP vectors we perform the comparisons as EQ and then
invert the resulting mask. Ideally we'd like to replace this with just a XOR
and the appropriate branch.
2. When on an SVE enabled system we would like to use an SVE compare + branch
for the Adv. SIMD sequence which could happen due to cost modelling. However
we can only do so based on if we know that the values being compared against
are the boolean masks. This means we can't really use combine to do this
because combine would have to match the entire sequence including the
vector comparisons because at RTL we've lost the information that
VECTOR_BOOLEAN_P would have given us. This sequence would be too long for
combine to match due to it having to match the compare + branch sequence
being generated as well. It also becomes a bit messy to match ANY and ALL
sequences.
3. For SVE systems we would like to avoid generating the PTEST operation
whenever possible. Because SVE vector integer comparisons already set flags
we don't need the PTEST on an any or all check. Eliminating this in RTL is
difficult, so the best approach is to not generate the PTEST at all when not
needed.
To handle these three cases the new optabs are added and the current cbranch is
no longer required if the target does not need help in distinguishing between
boolean vector vs data vector operands.
This difference is not important for correctness, but it is for optimization.
So I've chosen not to deprecate the cbranch_optab but make it completely optional.
I'll try to explain why:
An example is when unrolling is done on Adv. SIMD early break loops.
And so the new optabs aren't immediately useful because the comparisons can't
be done by the optab itself.
As such vec_cbranch_any would be called with vexit_reduc_34 and { 0, 0, 0, 0 }
however since this expects to perform the comparison itself we end up with
As such, to avoid the extra compare on boolean vectors, we still need the
cbranch_optab or the new vec_cbranch_* optabs need an extre operand to indicate
what kind of data they hold. Note that this isn't an issue for SVE because
SVE has BImode for booleans.
With these two optabs it's trivial to implement all the optimizations I
described above.
PR target/118974
* optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab,
cond_vec_cbranch_any_optab, cond_vec_cbranch_all_optab,
cond_len_vec_cbranch_any_optab, cond_len_vec_cbranch_all_optab): New.
* doc/md.texi: Document them.
* optabs.cc (prepare_cmp_insn): Refactor to take optab to check for
instead of hardcoded cbranch and support mask and len.
(emit_cmp_and_jump_insn_1, emit_cmp_and_jump_insns): Implement them.
(emit_conditional_move, emit_conditional_add, gen_cond_trap): Update
after changing function signatures to support new optabs.
Nathaniel Shead [Sat, 22 Nov 2025 11:30:32 +0000 (22:30 +1100)]
c++/modules: Walk indirectly exposed namespaces [PR122699]
In some situations, such as friend injection, we may add an entity to a
namespace without ever explicitly opening that namespace in this TU.
We currently have an additional loop to make sure the namespace is
considered purview, but this isn't sufficient to make
walk_module_binding find it, since the namspace itself is not in the
current TU's symbol table. This patch ensures we still process the
(hidden) binding for the injected friend in this TU.
PR c++/122699
gcc/cp/ChangeLog:
* name-lookup.h (expose_existing_namespace): Declare.
* name-lookup.cc (expose_existing_namespace): New function.
(push_namespace): Call it.
* pt.cc (tsubst_friend_function): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-21_a.C: New test.
* g++.dg/modules/tpl-friend-21_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Andre Vieira [Tue, 25 Nov 2025 11:39:35 +0000 (11:39 +0000)]
arm: add extra sizes to Wstrinop-overflow-47.c warning tests
A thumb2 target without a FPU like -mcpu=cortex-m3 will generate
'vector(4) char' stores which lead to warnings with sizes that weren't being
allowed before by the test. This patch adds these sizes.
gcc/testsuite/ChangeLog:
* gcc.dg/Wstringop-overflow-47.c: Adjust warnings to allow for 32-bit
stores.
Nathaniel Shead [Sun, 23 Nov 2025 12:24:39 +0000 (23:24 +1100)]
c++/modules: Stream all REQUIRES_EXPR_PARMS [PR122789]
We don't generally stream the TREE_CHAIN of a DECL, as this would cause
us to unnecessarily walk into the next member in its scope chain any
time it was referenced by an expression.
Unfortunately, REQUIRES_EXPR_PARMS is a tree chain of PARM_DECLs, so we
were only ever streaming the first parameter. This meant that when a
parameter's type could not be tsubst'd we would ICE instead of returning
false.
This patch special-cases REQUIRES_EXPR to always stream the chain of
decls in its first operand. As a drive-by improvement we also remove a
fixme about checking uncontexted PARM_DECLs.
PR c++/122789
gcc/cp/ChangeLog:
* module.cc (trees_out::core_vals): Treat REQUIRES_EXPR
specially and stream the chained decls of its first operand.
(trees_in::core_vals): Likewise.
(trees_out::tree_node): Check the PARM_DECLs we see are what we
expect.
gcc/testsuite/ChangeLog:
* g++.dg/modules/concept-12_a.C: New test.
* g++.dg/modules/concept-12_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Jason Merrill [Tue, 11 Nov 2025 10:28:01 +0000 (15:58 +0530)]
driver/c++: add --compile-std-module
For simple testcases that want to use the std module, it would be useful to
have a reasonably short way to request building the binary module form
before the testcase. So with this patch users can write
I expect this to be particularly useful on godbolt.org, where currently
building a modules testcase requires messing with cmake. One complication
there is that just adding std.cc to the command line arguments hits the
"cannot specify -o with -S with multiple files" error, so I avoid counting
these added inputs as "multiple files"; in that situation each compile will
output to the same target file, with the user-specified input last so it's
the one that actually ends up in the target after the command completes.
The following testcase ICEs, because
1) -fsanitize=bounds uses TYPE_MAX_VALUE (TYPE_DOMAIN (type)) with
1 or 2 added as last argument of .UBSAN_BOUNDS call and that
expression at that point is some NOP_EXPR around SAVE_EXPR with
testing for negative sizes
2) during gimplification, gimplify_type_sizes is invoked on the DECL_EXPR
outside of an OpenMP region, and forces TYPE_MAX_VALUE into
a pseudo instead, with the SAVE_EXPR obviously being evaluated
before that
3) the OpenMP gimplification sees an implicit or explicit data sharing
of a VLA and among other things arranges to firstprivatize TYPE_MAX_VALUE
4) when gimplifying the .UBSAN_BOUNDS call inside of the OpenMP region,
it sees a SAVE_EXPR and just gimplifies it to the already initialized
s.1 temporary; but nothing marks s.1 for firstprivatization on the
containing construct(s). The problem is that with SAVE_EXPR we never
know if the first use is within the same statement (typical use) or
somewhere earlier in the same OpenMP construct or in an outer OpenMP
construct or its parent etc., the SAVE_EXPR temporary is a function
local var, not something that is added to the innermost scope where
it is used (and it can't because it perhaps could be used outside of
it); so for OpenMP purposes, SAVE_EXPRs better should be used only
within the same OpenMP region and not across the whole function
The following patch fixes it by deferring the addition of
TYPE_MAX_VALUE in the last argument of .UBSAN_BOUNDS until gimplification
for VLAs, if it sees a VLA, instead of making the first argument
0 with pointer to the corresponding array type, it sets the first
argument to 1 with the same type and only sets the last argument to the
addend (1 or 2). And then gimplification can detect this case and
add the TYPE_MAX_VALUE (which in the meantime should have gone through
gimplify_type_sizes).
2025-11-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120052
gcc/
* gimplify.cc (gimplify_call_expr): For IFN_UBSAN_BOUNDS
call with integer_onep first argument, change that argument
to 0 and add TYPE_MAX_VALUE (TYPE_DOMAIN (arr_type)) to
3rd argument before gimplification.
gcc/c-family/
* c-ubsan.cc (ubsan_instrument_bounds): For VLAs use
1 instead of 0 as first IFN_UBSAN_BOUNDS argument and only
use the addend without TYPE_MAX_VALUE (TYPE_DOMAIN (type))
in the 3rd argument.
gcc/testsuite/
* c-c++-common/gomp/pr120052.c: New test.
Jakub Jelinek [Tue, 25 Nov 2025 10:18:07 +0000 (11:18 +0100)]
testsuite: Fix up vla-1.c test [PR119931]
From what I can see, the vla-1.c test has been added to test the handling
of debug info for optimized out parameters. But recent changes don't make
the argument optimized away, but optimized away and replaced by constant 5
(even without IPA-VRP). The function is noinline, but can't be noclone
nor noipa exactly because we want to test how it behaves when it is cloned
and the unused argument is dropped.
So, the following patch arranges to hide from the IPA optimizations the
value of x in the caller (and even make sure it is preserved in a register
or stack slot in the caller across the call).
2025-11-25 Jakub Jelinek <jakub@redhat.com>
PR testsuite/119931
* gcc.dg/vla-1.c (main): Hide x value from optimizers and use it after
the call as well.
Rainer Orth [Tue, 25 Nov 2025 09:51:38 +0000 (10:51 +0100)]
testsuite: Fix g++.dg/DRs/dr2581-1.C etc. on non-Linux
The g++.dg/DRs/dr2581-?.C tests FAIL on several targets in two ways:
* Both tests FAIL on a couple of targets in the same way:
FAIL: g++.dg/DRs/dr2581-1.C -std=c++11 (test for warnings, line 25)
FAIL: g++.dg/DRs/dr2581-1.C -std=c++17 (test for warnings, line 25)
FAIL: g++.dg/DRs/dr2581-1.C -std=c++20 (test for warnings, line 25)
FAIL: g++.dg/DRs/dr2581-1.C -std=c++23 (test for warnings, line 25)
FAIL: g++.dg/DRs/dr2581-1.C -std=c++26 (test for warnings, line 25)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++11 (test for errors, line 25)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++14 (test for errors, line 25)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++17 (test for errors, line 25)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++20 (test for errors, line 25)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++23 (test for errors, line 25)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++26 (test for errors, line 25)
i.e. the __STDC_ISO_10646__ warning is missing. This happens on
Solaris, FreeBSD, Darwin, and several embedded targets. While
__STDC_ISO_10646__ already exists in C99, it's optional there and
seems to be only defined on Linux/glibc. Thus this patch xfail's this
check on non-Linux.
* Besides, on Solaris only there are more failures for -std=c++11 to
c++26, like
FAIL: g++.dg/DRs/dr2581-2.C -std=c++11 (test for warnings, line 24)
FAIL: g++.dg/DRs/dr2581-2.C -std=c++11 (test for excess errors)
Jakub Jelinek [Tue, 25 Nov 2025 09:30:51 +0000 (10:30 +0100)]
openmp: Fix up OpenMP expansion of collapsed loops [PR120564]
Most of gimple_build_cond_empty callers call just build2 to prepare
condition which is put into GIMPLE_COND, but this one spot has been
using incorrectly fold_build2. Now the arguments of the *build2 were
already gimplified, but the folding of some conditions can turn say
unsigned_var > INT_MAX into (int) unsigned_var < 0 etc. and thus
turn the condition into something invalid in gimple, because we only
try to regimplify the operands if they refer to some decl which needs
to be regimplified (has DECL_VALUE_EXPR on it).
Fixed by also using build2 instead of fold_build2.
2025-11-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120564
* omp-expand.cc (extract_omp_for_update_vars): Use build2 instead of
fold_build2 to build argument for gimple_build_cond_empty.
Jakub Jelinek [Tue, 25 Nov 2025 09:06:46 +0000 (10:06 +0100)]
alias: Fix up BITINT_TYPE and non-standard INTEGER_TYPE alias handling [PR122624]
The testcase in the PR is miscompiled on aarch64 with
--param=ggc-min-expand=0 --param=ggc-min-heapsize=0 -O2
(not including it in the testsuite because it is too much of
a lottery).
Anyway, the problem is that the testcase only uses unsigned _BitInt(66)
and never uses _BitInt(66), get_alias_set remembers alias set for
ARRAY_TYPE (of its element type in ARRAY_TYPE's TYPE_ALIAS_SET),
c_common_get_alias_set does not remember in TYPE_ALIAS_SET alias of
unsigned types and instead asks for get_alias_set of corresponding
signed type and that creates a new alias set for each new canonical
type.
So, in this case, when being asked about get_alias_set on ARRAY_TYPE
unsigned _BitInt(66) [N], it recurses down to c_common_get_alias_set
which asks for alias set of at that point newly created signed type
_BitInt(66), new alias set is created for it, remembered in that
signed _BitInt(66) TYPE_ALIAS_SET, not remembered in unsigned _BitInt(66)
and remembered in ARRAY_TYPE's TYPE_ALIAS_SET.
Next a GC collection comes, signed _BitInt(66) is not used anywhere in
any reachable from GC roots, so it is removed.
Later on we ask alias oracle whether the above mentioned ARRAY_TYPE
can for TBAA alias pointer dereference with the same unsigned _BitInt(66)
type. For the ARRAY_TYPE, we have the above created alias set remembered
in TYPE_ALIAS_SET, so that is what we use, but for the unsigned _BitInt(66)
we don't, so create a new signed _BitInt(66), create a new alias set for it
and that is what is returned, so we have to distinct alias sets and return
that they can't alias.
Now, for standard INTEGER_TYPEs this isn't a problem, because both the
signed and unsigned variants of those types are always reachable from GTY
roots. For BITINT_TYPE (or build_nonstandard_integer_type built types)
that isn't the case. I'm not convinced we need to fix it for
build_nonstandard_integer_type built INTEGER_TYPEs though, for bit-fields
their address can't be taken in C/C++, but for BITINT_TYPE this clearly
is a problem.
So, the following patch solves it by
1) remembering the alias set we got from get_alias_set on the signed
_BitInt(N) type in the unsigned _BitInt(N) type
2) returning -1 for unsigned _BitInt(1), because there is no corresponding
signed _BitInt type and so we can handle it normally
3) so that the signed _BitInt(N) type isn't later GC collected and later
readded with a new alias set incompatible with the still reachable
unsigned _BitInt(N) type, the patch for signed _BitInt(N) types checks
if corresponding unsigned _BitInt(N) type doesn't already have
TYPE_ALIAS_SET_KNOWN_P, in that case it remembers and returns that;
in order to avoid infinite recursion, it doesn't call get_alias_set
on the unsigned _BitInt(N) type though
4) while type_hash_canon remembers in the type_hash_table both the hash
and the type, so what exactly we use as the hash isn't that important,
I think using type_hash_canon_hash for BITINT_TYPEs is actually better over
hasing TYPE_MAX_VALUE, because especially for larger BITINT_TYPEs
TYPE_MAX_VALUE can have lots of HWIs in the number, for
type_hash_canon_hash hashes for BITINT_TYPEs only
i) TREE_CODE (i.e. BITINT_TYPE)
ii) TYPE_STRUCTURAL_EQUALITY_P flag (likely false)
iii) TYPE_PRECISION
iv) TYPE_UNSIGNED
so 3 ints and one flag, while the old way can hash one HWI up to
1024 HWIs; note it is also more consistent with most other
type_hash_canon calls, except for build_nonstandard_integer_type; for
some reason changing that one to use also type_hash_canon_hash doesn't
work, causes tons of ICEs
This version of the patch handles INTEGER_TYPEs the same as BITINT_TYPE.
2025-11-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/122624
* tree.cc (build_bitint_type): Use type_hash_canon_hash.
* c-common.cc (c_common_get_alias_set): Fix up handling of BITINT_TYPE
and non-standard INTEGER_TYPEs. For unsigned _BitInt(1) always return
-1. For other unsigned types set TYPE_ALIAS_SET to get_alias_set of
corresponding signed type and return that. For signed types check if
corresponding unsigned type has TYPE_ALIAS_SET_KNOWN_P and if so copy
its TYPE_ALIAS_SET and return that.
In r11-3059-g8183ebcdc1c843, Julian fixed a few issues with
atomic_capture-2.c relying on iteration order guarantees that do not
exist under OpenACC parallelized loops and, notably, do not happen even
by accident on AMDGCN.
The atomic_capture-3.c testcase was made by copying it from
atomic_capture-2.c and adding additional options in commit r12-310-g4cf3b10f27b199, but from an older version of
atomic_capture-2.c, which lacked these ordering fixes fixes, so they
resurfaced in this test.
This patch ports those fixes from atomic_capture-2.c into
atomic_capture-3.c.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-3.c: Copy
changes in r11-3059-g8183ebcdc1c843 from atomic_capture-2.c.
Rainer Orth [Tue, 25 Nov 2025 08:23:08 +0000 (09:23 +0100)]
testsuite: i386: Restrict pr120936-1.c etc. to Linux
After switching the i386 check-function-bodies tests to use the new
dg-add-options check_function_bodies feature, several tests still FAIL
in the same way on Solaris/x86. E.g.
i.e. the test expects a call to mcount, while on Solaris _mcount is
called instead. MCOUNT_NAME is only defined as mcount in gnu-user.h and
x86-64.h, so the patch restricts the tests to Linux.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
Rainer Orth [Tue, 25 Nov 2025 08:20:54 +0000 (09:20 +0100)]
testsuite: i386: Guard NO_PROFILE_COUNTERS tests
After switching the i386 check-function-bodies tests to use the new
dg-add-options check_function_bodies feature, several tests still FAIL
in similar ways on Solaris/x86:
* Some FAIL like this:
FAIL: gcc.target/i386/pr120936-6.c (test for excess errors)
Excess errors:
cc1: error: '-mnop-mcount' is not compatible with this target
This happens because -mnop-mcount is only supported in
i386/i386-options.cc (ix86_option_override_internal) if
NO_PROFILE_COUNTERS.
* Others FAIL like
FAIL: gcc.target/i386/pr120936-10.c (test for excess errors)
Excess errors:
gcc/testsuite/gcc.target/i386/pr120936-10.c:23:1: sorry, unimplemented: profiling '-mcmodel=large' with PIC is not supported
This error is generated in i386/i386.cc (x86_function_profiler) if
!NO_PROFILE_COUNTERS.
NO_PROFILE_COUNTERS is only defined in dragonfly.h, x86_64.sh,
gnu-user.h, freebsd.h, cygming.h, and, netbsd-elf.h. However, a couple
of similar tests are restricted to Linux only, so this patch follows
suit. One could introduce a new effective-target keyword to fully
handle this, though.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
Rainer Orth [Tue, 25 Nov 2025 08:18:13 +0000 (09:18 +0100)]
testsuite: i386: Handle check-function-bodies options using dg-add-options
The {gcc,g++}.target/i386 tests that use dg-final { check-function-bodies }
need addititional options on Solaris/x86. So far, those tests have been
updated manually to add the required -fdwarf2-cfi-asm
-fasynchronous-unwind-tables. However, this has two issues:
* Those Solaris-only options make dg-options harder to read, although
they do no harm on other targets.
* Besides, the need for those options repeated got forgotten for each
new bunch of such tests.
To avoid that problem in the future, this patch introduces a new
dg-add-options feature, check_function_bodies, that adds those options
exactly on the targets that need it. It both improves readability and
will hopefully not be forgotten again for future tests.
Tested on i386-pc-solaris2.11 with as/ld and gas/ld, and
x86_64-pc-linux-gnu.
All of them FAIL in the same way: when gas is used, the tests contain
something like
.uaword .LLASF4 ! DW_AT_const_value: "my_foo"
while for /bin/as
.ascii "my_foo\0" ! DW_AT_const_value
is emitted. While other dwarf2 tests support both forms, the tests
above don't. This patch fixes this. To make the regex more readable,
they are switched to using braces instead of double quotes, thus
avoiding excessive escaping. At the same time, they now use
newline-sensitive matching to avoid .* matching across lines.
Tested on sparc-sun-solaris2.11 with as and gas, and
x86_64-pc-linux-gnu.
Kito Cheng [Mon, 24 Nov 2025 07:00:07 +0000 (15:00 +0800)]
c-family: Don't register include paths with -fpreprocessed
This fixes a permission error that occurs when cross-compiling with
-save-temps and a relocated toolchain, where the original build path
exists but is inaccessible.
The issue occurs when:
- Building the toolchain at /home/scratch/build/
- Installing it to another location like /home/user/rv64-toolchain/
- The /home/scratch directory exists but has insufficient permissions
(e.g. drwx------ for /home/scratch/)
Without this fix, cc1 would report errors like:
cc1: error: /home/scratch/build/install/riscv64-unknown-elf/usr/local/include: Permission denied
This occurred because the GCC driver did not pass GCC_EXEC_PREFIX and
isysroot to cc1/cc1plus in the compile stage when using -save-temps.
This caused cc1/cc1plus to search for headers from the wrong (original
build) path instead of the relocated installation path.
The fix ensures cc1/cc1plus won't try to collect include paths when
-fpreprocessed is specified. This prevents the permission error during
cross-compilation with -save-temps, as the preprocessed file already
contains all necessary headers.
gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Skip register_include_chains
when cpp_opts->preprocessed is set.
Alexandre Oliva [Mon, 24 Nov 2025 23:07:20 +0000 (20:07 -0300)]
ira: sort allocno_hard_regs by regset
Using hashes of data structures for tie breaking makes sorting
dependent on type sizes, padding, and endianness, i.e., unstable
across different hosts.
ira-color.cc's allocno_hard_regs_compare does that, and it causes
different register allocations to be chosen for the same target
depending on the host. That's undesirable.
Compare the HARD_REG_SETs directly instead, looking for the
lowest-numbered difference register to use as the tie breaker for the
cost compare.
With a hardware implementation of ctz, this is likely faster than the
hash used to break ties before.
for gcc/ChangeLog
PR rtl-optimization/122767
* ira-color.cc (allocno_hard_regs_compare): Break ties
using...
* hard-reg-set.h (hard_reg_set_first_diff): ... this. New
HARD_REG_SET API entry point.
Robin Dapp [Fri, 14 Nov 2025 14:01:29 +0000 (15:01 +0100)]
vect: Make SELECT_VL a convert optab.
Currently select_vl is a direct optab with its mode always Xmode/Pmode.
This does not give us sufficient freedom to enable/disable vsetvl
(=SELECT_VL) depending on the vector mode.
This patch makes select_vl a convert optab and adjusts the associated IFN
functions as well as the query/emit code in the vectorizer.
With this patch nothing new is actually exercised yet. This is going to
happen in a separate riscv patch that enables "VLS" select_vl.
Robin Dapp [Wed, 22 Oct 2025 19:14:36 +0000 (21:14 +0200)]
RISC-V: Add BF VLS modes and document iterators.
We're missing some VLS BF modes, e.g. for gathers. This patch adds them.
While at it, it adds some documentation to the iterators and corrects
the vec_set iterator (for the time being).
Regtested on rv64gcv_zvl512b but curious what the CI says.
Robin Dapp [Sat, 22 Nov 2025 19:53:25 +0000 (20:53 +0100)]
vect: Use start value in vect_load_perm_consecutive_p [PR122797].
vect_load_perm_consecutive_p is used in a spot where we want to check
that a permutation is consecutive and starting with 0. Originally I
wanted to add this way of checking to the function but what I ended
up with is checking whether the given permutation is consecutive
starting from a certain index. Thus, we will return true for
e.g. {1, 2, 3} which doesn't make sense in the context of the PR.
This patch corrects it.
PR tree-optimization/122797
gcc/ChangeLog:
* tree-vect-slp.cc (vect_load_perm_consecutive_p): Check
permutation start at element 0 with value instead of starting
at a given element.
(vect_optimize_slp_pass::remove_redundant_permutations):
Use start value of 0.
* tree-vectorizer.h (vect_load_perm_consecutive_p): Set default
value to to UINT_MAX.
Robin Dapp [Sun, 16 Nov 2025 17:42:04 +0000 (18:42 +0100)]
forwprop: Allow nop conversions for vector constructor.
I observed a vect-construct forwprop opportunity in x264 that we
could handle when checking for a nop conversion instead of a useless
conversion. IMHO a nop-conversion check is sufficient as we're only
dealing with permutations in simplify_vector_constructor.
This patch replaces uses of useless_type_conversion_p with
tree_nop_conversion_p in simplify_vector_constructor.
It was bootstrapped and regtested on x86 and power10, regtested on
aarch64 and riscv64.
There is a single scan-test failure on power
(gcc.target/powerpc/builtins-1.c). The code actually looks better so
I took the liberty of adjusting the test expectation.