Jie Mei [Sat, 14 Sep 2024 07:04:09 +0000 (15:04 +0800)]
MIPS: Add MSUBF.fmt instruction for MIPSr6
GCC currently uses two instructions (NEG.fmt and MADDF.fmt) for
operations like `x - (y * z)' for MIPSr6. We can further tune this by
using only MSUBF.fmt instead of those two.
This patch adds MSUBF.fmt instrutions with corresponding tests.
gcc/ChangeLog:
* config/mips/mips.md (fms<mode>4): Generates MSUBF.fmt
instructions.
(*fms<mode>4_msubf): Same as above.
(fnma<mode>4): Same as above.
(*fnma<mode>4_msubf): Same as above.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips-msubf.c: New tests for MIPSr6.
Add a new diagnostic, -Wmultiple-parameter-fwd-decl-lists, which
diagnoses uses of this obsolescent syntax.
Add this diagnostic in -Wextra.
Forward declarations of parameters are very rarely used. And functions
that need two forward declaractions of parameters are also quite rare.
This combination results in this code almost not existing in any code
base, which makes adding this to -Wextra okay. FWIW, I've tried finding
such code using a code search engine, and didn't find any cases (but the
regex for that isn't easy to writei, so I wouldn't trust it).
* doc/extend.texi: Clarify documentation about lists of
parameter forward declarations, and mention that more than one
of them are unnecessary.
* doc/invoke.texi: Document the new
-Wmultiple-parameter-fwd-decl-lists.
gcc/testsuite/ChangeLog:
* gcc.dg/Wmultiple-parameter-fwd-decl-lists.c: New test.
Harald Anlauf [Fri, 26 Sep 2025 17:20:39 +0000 (19:20 +0200)]
Fortran: fix uninitialized read in testcase gfortran.dg/pdt_48.f03
Running the testcase using valgrind --leak-check=full --track-origins=yes:
==28585== Conditional jump or move depends on uninitialised value(s)
==28585== at 0x400E19: MAIN__ (pdt_48.f03:48)
==28585== by 0x400EDB: main (pdt_48.f03:34)
==28585== Uninitialised value was created by a heap allocation
==28585== at 0x4841984: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28585== by 0x400975: __pdt_m_MOD_add (pdt_48.f03:30)
==28585== by 0x400D84: MAIN__ (pdt_48.f03:44)
==28585== by 0x400EDB: main (pdt_48.f03:34)
The cause was a partial initialization of a vector used in a subsequent
addition. Initialize the remaining elements of the first vector by zero.
Jan Hubicka [Fri, 26 Sep 2025 13:57:03 +0000 (15:57 +0200)]
Fix precise 0 handling in afdo_propagate_edge
Currently afdo_propagate_edge will turn precise 0 to autofdo 0 because it thinks
auto-profile claims some samples has been executed in the given basic block, while
this is only a consequence of < being defined by
0 (predise) < 0 (autofdo)
gcc/ChangeLog:
* auto-profile.cc (afdo_propagate_edge): Fix handling of precize 0
counts.
Andrew Stubbs [Wed, 24 Sep 2025 11:58:23 +0000 (11:58 +0000)]
amdgcn: Remove vector alignment restrictions
The supported misalignment logic seems to be a bit arbitrary. Some of it looks
like it was copied from the Arm implementation, although testing shows that the
packed accesses do not work (weird subregs happen).
AMD GCN does have some alignment restrictions on Buffer instructions, but as we
don't use those that's irrelvant. The Flat and Global instructions (that we do
use) have no such restrictions.
LDS memory -- which can be accessed via Flat instructions -- does have
alignment restrictions, but the compiler is not using LDS for arbitrary
vectors. If the user deliberately choses to place unaligned data in
low-latency memory then a runtime exception should occur (no silent bad
behaviour), so there's no reason to pessimise the normal case.
gcc/ChangeLog:
* config/gcn/gcn.cc
(gcn_vectorize_support_vector_misalignment): Allow any alignment, as
long as it's not packed.
Joseph Myers [Fri, 26 Sep 2025 11:12:12 +0000 (11:12 +0000)]
c: Give permerror for excess braces in scalar initializers [PR88642]
As noted in bug 88642, the C front end fails to give errors or
pedwarns for scalar initializers with too many levels of surrounding
braces. There is a warning for redundant braces around a scalar
initializer within a larger braced initializer (valid for a single
such level within a structure, union or array initializer; not valid
for more than one such level, or where the outer layer of braces is
itself for a scalar, either redundant braces themselves or part of a
compound literal), but this never becomes an error even for invalid
cases. Check for this case and turn the warning into a permerror when
there are more levels of braces than permitted. The existing warning
is unchanged for a single (permitted) level of redundant braces around
a scalar initializer inside a structure, union or array initializer,
and it's also unchanged that no such warning is given for a single
(permitted) level of redundant braces around a top-level scalar
initializer.
Technically this is a C2y issue (these rules on valid initializers
moved into Constraints as a result of N3346, accepted in Minneapolis;
previously, as a "shall" outside constraints, violating these rules
resulted in compile-time undefined behavior without requiring a
diagnostic).
Hopefully little code is actually relying on not getting an error
here. In view of gcc.dg/tree-ssa/ssa-dse-10.c showing that at least
some code may be using such over-braced initializers (initializer of
pubKeys at line 1167 in that test; I'm not at all sure how that
initializer ends up getting interpreted to translate it to something
equivalent but properly structured), this is made a permerror rather
than a hard error, so -fpermissive (as already used by that test) can
be used to disable the error (the default -fpermissive for old
standards modes is not a problem given that before C2y this is
undefined behavior not a constraint violation).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/88642
gcc/c/
* c-typeck.cc (constructor_braced_scalar): New variable.
(struct constructor_stack): Add braced_scalar field.
(really_start_incremental_init): Handle constructor_braced_scalar
and braced_scalar field.
(push_init_level): Handle constructor_braced_scalar and
braced_scalar field. Give permerror rather than warning for
nested braces around scalar initializer.
(pop_init_level): Handle constructor_braced_scalar and
braced_scalar field.
Jan Hubicka [Fri, 26 Sep 2025 10:39:07 +0000 (12:39 +0200)]
Fix integer overflow in profile_count::probability_in
This patch fixes integer overflow in profile_count::probability_in which happens
for very large counts. This was probably not that common in practice until
scaled AutoFDO profiles were intorduces.
This was introduced as cut&paste from profile_probability implementation.
I reviewed multiplicaitons in the file for safety and noticed that in some
cases the code is over-protective. In profile_probability::operator/ we alrady
scale that m_val <= other.m_val and thus we know result will be in the range
0...max_probability. In profile_probability::apply_scale we deal with 30bit
value from profile_probability so no overflow can happen.
gcc/ChangeLog:
* profile-count.h (profile_probability::operator/): Do not cap
twice.
(profile_probability::operator/=): Likewise.
(profile_probability::apply_scale): Do not watch for overflow.
(profile_count::probability_in): Watch overflow.
Jonathan Wakely [Wed, 10 Sep 2025 09:10:07 +0000 (10:10 +0100)]
libstdc++: Reuse predicates in std::search and std::is_permutation
Hoist construction of the call wrappers out of the loop when we're
repeatedly creating a call wrapper with the same bound arguments.
We need to be careful about iterators that return proxy references,
because bind1st(pred, *first) could bind a reference to a prvalue proxy
reference returned by *first. That would then be an invalid reference by
the time we invoked the call wrapper.
If we dereference the iterator first and store the result of that on the
stack, then we don't have a prvalue proxy reference, and can bind it (or
the value it refers to) into the call wrapper:
auto&& val = *first; // lifetime extension
auto wrapper = bind1st(pred, val);
for (;;)
/* use wrapper */;
This ensures that the reference returned from *first outlives the call
wrapper, whether it's a proxy reference or not.
For C++98 compatibility in __search we can use __decltype(expr) instead
of auto&&.
libstdc++-v3/ChangeLog:
* include/bits/stl_algobase.h (__search, __is_permutation):
Reuse predicate instead of creating a new one each time.
* include/bits/stl_algo.h (__is_permutation): Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Fri, 19 Sep 2025 15:03:11 +0000 (16:03 +0100)]
libstdc++: Simplify std::erase functions for sequence containers
This removes the use of std::ref that meant that __remove_if used an
indirection through the reference, which might be a pessimization. Users
can always use std::ref to pass expensive predicates into erase_if, but
we shouldn't do it unconditionally. We can std::move the predicate so
that if it's not cheap to copy and the user didn't use std::ref, then we
try to use a cheaper move instead of a copy.
There's no reason that std::erase shouldn't just be implemented by
forwarding to std::erase_if. I probably should have done that in r12-4083-gacf3a21cbc26b3 when std::erase started to call __remove_if
directly.
libstdc++-v3/ChangeLog:
* include/std/deque (erase_if): Move predicate instead of
wrapping with std::ref.
(erase): Forward to erase_if.
* include/std/inplace_vector (erase_if, erase): Likewise.
* include/std/string (erase_if, erase): Likewise.
* include/std/vector (erase_if, erase): Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Fri, 13 Jun 2025 16:27:51 +0000 (17:27 +0100)]
libstdc++: Eliminate __gnu_cxx::__ops function objects
This removes the indirect functors from <bits/predefined_ops.h> that are
used by our STL algorithms. Currently we wrap all predicates and values
into callables which accept iterator arguments, and automatically
dereference the iterators. With this change we no longer do that
dereferencing and so all predicates are passed values not iterators, and
the algorithms that invoke those predicates must dereference the
iterators.
This avoids wrapping user-provided predicates into another predicate
that does the dereferencing. User-provided predicates are now passed
unchanged to our internal algos like __search_n. For the overloads that
take a value instead of a predicate, we still need to create a predicate
that does comparison to the value, but we can now use std::less<void>
and std::equal_to<void> as the base predicate and bind the value to
those base predicates.
Because the "transparent operators" std::less<void> and
std::equal_to<void> were not added until C++14, this change defines
those explicit specializations unconditionally for C++98 and C++11 too
(but the default template arguments that make std::less<> and
std::equal_to<> refer to those specializations are still only present
for C++14 and later, because we don't need to rely on those default
template arguments for our internal uses).
When binding a predicate and a value into a new call wrapper, we now
decide whether to store the predicate by value when it's an empty type
or a scalar (such as a function pointer). This avoids a
double-indirection through function pointers, and avoids storing and
invoking stateless empty functors through a reference. For C++11 and
later we also use [[no_unique_address]] to avoid wasted storage for
empty predicates (which includes all standard relational ops, such as
std::less).
The call wrappers in bits/predefined_ops.h all have non-const operator()
because we can't be sure that the predicates they wrap are
const-invocable. The requirements in [algorithms.requirements] for
Predicate and BinaryPredicate template arguments require pred(*i) to be
valid, but do not require that std::to_const(pred)(*i) has to be valid,
and similarly for binary_pred.
Jonathan Wakely [Thu, 25 Sep 2025 16:23:28 +0000 (17:23 +0100)]
libstdc++: Fix unsafe comma operators in <random> [PR122062]
This fixes a 'for' loop in std::piecewise_linear_distribution that
increments two iterators with a comma operator between them, making it
vulnerable to evil overloads of the comma operator.
It also changes a 'for' loop used by some other distributions, even
though those are only used with std::vector<double>::iterator and so
won't find any overloaded commas.
libstdc++-v3/ChangeLog:
PR libstdc++/122062
* include/bits/random.tcc (__detail::__normalize): Use void cast
for operands of comma operator.
(piecewise_linear_distribution): Likewise.
* testsuite/26_numerics/random/piecewise_linear_distribution/cons/122062.cc:
New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Hewill Kang <hewillk@gmail.com>
Paul Thomas [Fri, 26 Sep 2025 06:30:07 +0000 (07:30 +0100)]
Fortran: Fix uninitialized reads for pdt_13.f03 etc. [PR122002]
2025-09-26 Harald Anlauf <anlauf@gcc.gnu.org>
gcc/fortran
PR fortran/122002
* decl.cc (gfc_get_pdt_instance): Initialize 'instance' to NULL
and set 'kind_value' to zero before calling gfc_extract_int.
* primary.cc (gfc_match_rvalue): Intitialize 'ctr_arglist' to
NULL and test for default values if gfc_get_pdt_instance
returns NULL.
Because LoongArch does not implement TARGET_CAN_INLINE_P,
functions with the target attribute set and those without
it cannot be inlined. At the same time, setting the
always_inline attribute will cause compilation failure.
To solve this problem, I implemented this hook. During the
implementation process, it checks the status of the target
special options of the caller and callee, such as the ISA
extension.
PR target/121875
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_can_inline_p): New function.
(TARGET_CAN_INLINE_P): Define.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/can_inline_1.c: New test.
* gcc.target/loongarch/can_inline_2.c: New test.
* gcc.target/loongarch/can_inline_3.c: New test.
* gcc.target/loongarch/can_inline_4.c: New test.
* gcc.target/loongarch/can_inline_5.c: New test.
* gcc.target/loongarch/can_inline_6.c: New test.
* gcc.target/loongarch/pr121875.c: New test.
For padded layouts we want to check that the product of the
padded stride with the remaining extents is representable.
Creating a second overload, allows passing in subspans of the
static extents and retains the ergonomics for the common case
of passing in all static extents.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__static_quotient): New overload.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Jonathan Wakely [Wed, 24 Sep 2025 13:37:37 +0000 (14:37 +0100)]
libstdc++: Check feature test macro for robust_nonmodifying_seq_ops
We should check the relevant feature test macro instead of just the
value of __cplusplus.
Also add a comment explaining why the __cplusplus check guarding
__sample *can't* be changed to check __glibcxx_sample (because __sample
is also used in C++14 by std::experimental::sample, not only by C++17
std::sample).
libstdc++-v3/ChangeLog:
* include/bits/stl_algo.h: Check robust_nonmodifying_seq_ops
feature test macro instead of checking __cplusplus value. Add
comment to another __cplusplus check.
* include/bits/stl_algobase.h: Add comment to #endif.
Jonathan Wakely [Fri, 19 Sep 2025 11:11:26 +0000 (12:11 +0100)]
libstdc++: Remove unwanted STDC_HEADERS macro from c++config.h [PR79147]
Similar to r16-4034-g1953939243e1ab, this comments out another macro
that Autoconf adds to the generated config.h but which is not wanted in
the c++config.h file that we install.
There's no benefit to defining _GLIBCXX_STDC_HEADERS in user code, so we
should just prevent it from being defined.
libstdc++-v3/ChangeLog:
PR libstdc++/79147
PR libstdc++/103650
* include/Makefile.am (c++config.h): Adjust sed command to
comment out STDC_HEADERS macro.
* include/Makefile.in: Regenerate.
libstdc++: Prepare mapping layout tests for padded layouts.
Using the existing tests for padded layouts requires the following
changes:
* The padded layouts are template classes. In order to be able to use
partially specialized templates, functions need to be converted to
structs.
* The layout mapping tests include a check that only applies if
is_exhaustive is static. This commit introduces a concept to check if
is_exhaustive is a static member function.
* Fix a test to not use a hard-coded layout_left.
The test empty.cc contains indentation mistakes that are fixed.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/mdspan/layouts/empty.cc: Fix indent.
* testsuite/23_containers/mdspan/layouts/mapping.cc
(test_stride_1d): Fix test.
(test_stride_2d): Rewrite using a struct.
(test_stride_3d): Ditto.
(has_static_is_exhaustive): New concept.
(test_mapping_properties): Update test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
This assertion, despite what I said in r16-4070, is not valid: we can
reach here when deduping a VAR_DECL that didn't get a LANG_SPECIFIC in
the current TU. It's still correct to always use lang_cplusplus however
as for anything else the decl would have been created with an
appropriate LANG_SPECIFIC to start with.
[PATCH][PR target/121778] RISC-V: Improve rotation detection for RISC-V
This patch splits the canonical sign-bit checking idiom
into a 2-insn sequence when Zbb is available. Combine often normalizes
(xor (lshr A, (W - 1)) 1) to (ge A, 0). For width W = bitsize (mode), the
identity:
(a << 1) | (a >= 0) == (a << 1) | ((a >> (W - 1)) ^ 1) == ROL1 (a) ^ 1
Joseph Myers [Wed, 24 Sep 2025 19:53:01 +0000 (19:53 +0000)]
c: Fix handling of register atomic compound literals
The logic for loads and stores of _Atomic objects in the C front end
involves taking the address of such objects, with really_atomic_lvalue
detecting cases where this cannot be done (and also no special
handling is needed for atomicity), in particular register _Atomic
objects. This logic failed to deal with the case of register _Atomic
compound literals, so resulting in spurious errors "error: address of
register compound literal requested" followed by "error: argument 1 of
'__atomic_load' must be a non-void pointer type". (This is a C23 bug
that I found while changing really_atomic_lvalue as part of previous
C2y changes.) Add a use of COMPOUND_LITERAL_EXPR_DECL in that case.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
gcc/c/
* c-typeck.cc (really_atomic_lvalue): For a COMPOUND_LITERAL_EXPR,
check C_DECL_REGISTER on the COMPOUND_LITERAL_EXPR_DECL.
gcc/testsuite/
* gcc.dg/c23-complit-9.c: New test.
In the testcase from the PR, an assertion triggers because the compiler
tries to access the parent namespace of a contained procedure. But the
namespace is the formal namespace of a module procedure symbol in a
submodule, which hasn't its parent set.
To add a bit of context, in submodules, module procedures inherited from
their parent module have two different namespaces holding their dummy
arguments. The first one is generated by the the host association of
the module from the .mod file, and is made accessible in the procedure
symbol's formal_ns field. Its parent field is not set. The second one
is generated by the parser and contains the procedure implementation.
It's accessible from the list of contained procedures in the submodule
namespace. Its parent field is set.
This change modifies gfc_get_procedure_ns to favor the parser-generated
namespace in the submodule case where there are two namespaces to choose
from.
PR fortran/122046
gcc/fortran/ChangeLog:
* symbol.cc (gfc_get_procedure_ns): Try to find the namespace
among the list of contained namespaces before returning the
value from the formal_ns field.
Andrew Pinski [Sat, 20 Sep 2025 03:58:31 +0000 (20:58 -0700)]
gimple-fold/fab: Move ASSUME_ALIGNED handling to gimple-fold [PR121762]
This is the next patch in the series of removing fab.
This one is simplier than builtin_constant_p because the only
time we want to simplify this builtin is at the final folding step.
Note align-5.c needs to change slightly as __builtin_assume_aligned
is no longer taken into account for the same reason as why PR 111875
is closed as invalid and why the testcase is failing at -Og
I added a new testcase align-5a.c where the pointer is explictly aligned
so that the check is gone there.
Note __builtin_assume_aligned should really be instrumented for UBSAN,
I filed PR 122038 for that.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121762
gcc/ChangeLog:
* gimple-fold.cc (gimple_fold_builtin_assume_aligned): New function.
(gimple_fold_builtin): Call gimple_fold_builtin_assume_aligned
for BUILT_IN_ASSUME_ALIGNED.
* tree-ssa-ccp.cc (pass_fold_builtins::execute): Remove handling
of BUILT_IN_ASSUME_ALIGNED.
gcc/testsuite/ChangeLog:
* c-c++-common/ubsan/align-5.c: Update as __builtin_assume_aligned
is no longer taked into account.
* c-c++-common/ubsan/align-5a.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jennifer Schmitz [Fri, 13 Jun 2025 08:19:43 +0000 (01:19 -0700)]
AArch64: Enable dispatch scheduling for Neoverse V2.
This patch adds dispatch constraints for Neoverse V2 and illustrates the steps
necessary to enable dispatch scheduling for an AArch64 core.
The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG.
Please note that the values used here deviate slightly from the current SWOG
version but are based on correct numbers. Arm will do an official Neoverse V2
SWOG release with the updated values in due time.
Here are the steps how we implemented the dispatch constraints for
Neoverse V2:
1. We used instruction attributes to group instructions into dispatch groups,
corresponding to operations that utilize a certain pipeline type. For that,
we added a new attribute (neoversev2_dispatch) with values for the
different dispatch groups. The values of neoversev2_dispatch are determined
using expressions of other instruction attributes.
For example, the SWOG describes a constraint of "Up to 4 uOPs utilizing the
M pipelines". Thus, one of the values of neoversev2_dispatch is "m" and it
groups instructions that use the M pipelines such as integer multiplication.
Note that we made some minor simplifications compared to the information
in the SWOG, because the instruction annotation does not allow for a fully
accurate mapping of instructions to utilized pipelines. To give one example,
the instructions IRG and LDG are both tagged with "memtag", but IRG uses
the M pipelines, while LDG uses the L pipelines.
2. In the Neoverse V2 tuning model, we added an array of available slots per
dispatch constraint and a callback function that takes an insn as
input and returns a vector of pairs (a, b) where a is an index in the
array of slots and b is the number of occupied slots. The callback
function calls get_attr_neoversev2_dispatch(insn) and switches over the
result values to create a vector of occupied slots.
Thus, the new attribute neoversev2_dispatch provides a compact way to define
the dispatch constraints.
The array of available slots, its length, and a pointer to the
callback function are collected in a struct dispatch_constraint_into
which is referenced in the tune_params.
3. We enabled dispatch scheduling for Neoverse V2 by adding the
AARCH64_EXTRA_TUNE_DISPATCH_SCHED tune flag.
Performance evaluation showed no regression in several different
workloads including SPEC2017 and GROMACS2024.
Thank you, Tamar, for helping with performance evaluation.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64.md: Include neoversev2.md.
* config/aarch64/tuning_models/neoversev2.h: Enable dispatch
scheduling and add dispatch constraints.
* config/aarch64/neoversev2.md: New file and new instruction attribute
neoversev2_dispatch.
Jennifer Schmitz [Wed, 17 Sep 2025 10:22:12 +0000 (03:22 -0700)]
AArch64: Implement target hooks for dispatch scheduling.
This patch adds dispatch scheduling for AArch64 by implementing the two target
hooks TARGET_SCHED_DISPATCH and TARGET_SCHED_DISPATCH_DO.
The motivation for this is that cores with out-of-order processing do
most of the reordering to avoid pipeline hazards on the hardware side
using large reorder buffers. For such cores, rather than scheduling
around instruction latencies and throughputs, the compiler should aim to
maximize the utilized dispatch bandwidth by inserting a certain
instruction mix into the frontend dispatch window.
In the following, we will describe the overall implementation:
Recall that the Haifa scheduler makes the following 6 types of queries to
a dispatch scheduling model:
1) targetm.sched.dispatch (NULL, IS_DISPATCH_ON)
2) targetm.sched.dispatch_do (NULL, DISPATCH_INIT)
3) targetm.sched.dispatch (insn, FITS_DISPATCH_WINDOW)
4) targetm.sched.dispatch_do (insn, ADD_TO_DISPATCH_WINDOW)
5) targetm.sched.dispatch (NULL, DISPATCH_VIOLATION)
6) targetm.sched.dispatch (insn, IS_CMP)
For 1), we created the new tune flag AARCH64_EXTRA_TUNE_DISPATCH_SCHED.
For 2-5), we modeled dispatch scheduling using the class dispatch_window.
A dispatch_window object represents the window of operations that is dispatched
per cycle. It contains the two arrays max_slots and free_slots (the length
of the arrays is the number of dispatch constraints specified for a core)
to keep track of the available slots.
The dispatch_window class exposes functions to ask whether a given
instruction would fit into the dispatch_window or to add an instruction to
the window.
The model operates using only one dispatch_window object that is constructed
when 2) is called. Upon construction, it copies the number of available slots
given in the tuning model (more details on the changes to tune_params below).
During scheduling, instructions are added according to the dispatch
constraints. For that, the dispatch_window queries the tuning model using a
callback function that takes an insn as input and returns a vector of
pairs (a, b), where a is the index of the constraint and b is the number of
slots occupied.
The dispatch_window checks if the instruction fits into the current
window. If not, i.e. the current window is full, the free_slots array is
reset to max_slots. Then the dispatch_window deducts b slots from
free_slots[a] for each pair (a, b) in the vector returned by the callback.
A dispatch violation occurs when the number of free slots becomes negative
for any dispatch_constraint.
For 6), return false (see comment in aarch64-sched-dispatch.cc).
Dispatch information for a core can be added in its tuning model. We added
the new field *dispatch_constraint to the struct tune_params that holds a
pointer to a struct dispatch_constraints_info.
All current tuning models were initialized with nullptr.
(In the next patch, dispatch information will be added for Neoverse V2.)
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/ChangeLog:
* config.gcc: Add aarch64-sched-dispatch.o to extra_objs.
* config/aarch64/aarch64-protos.h (struct tune_params): New
field for dispatch scheduling.
(struct dispatch_constraint_info): New struct for dispatch scheduling.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNING_OPTION): New flag to enable dispatch scheduling.
* config/aarch64/aarch64.cc (TARGET_SCHED_DISPATCH): Implement
target hook.
(TARGET_SCHED_DISPATCH_DO): Likewise.
(aarch64_override_options_internal): Add check for definition of
dispatch constraints if dispatch-scheduling tune flag is set.
* config/aarch64/t-aarch64: Add aarch64-sched-dispatch.o.
* config/aarch64/tuning_models/a64fx.h: Initialize fields for
dispatch scheduling in tune_params.
* config/aarch64/tuning_models/ampere1.h: Likewise.
* config/aarch64/tuning_models/ampere1a.h: Likewise.
* config/aarch64/tuning_models/ampere1b.h: Likewise.
* config/aarch64/tuning_models/cortexa35.h: Likewise.
* config/aarch64/tuning_models/cortexa53.h: Likewise.
* config/aarch64/tuning_models/cortexa57.h: Likewise.
* config/aarch64/tuning_models/cortexa72.h: Likewise.
* config/aarch64/tuning_models/cortexa73.h: Likewise.
* config/aarch64/tuning_models/cortexx925.h: Likewise.
* config/aarch64/tuning_models/emag.h: Likewise.
* config/aarch64/tuning_models/exynosm1.h: Likewise.
* config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
* config/aarch64/tuning_models/generic.h: Likewise.
* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen1.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
* config/aarch64/tuning_models/neoversev3.h: Likewise.
* config/aarch64/tuning_models/neoversev3ae.h: Likewise.
* config/aarch64/tuning_models/olympus.h: Likewise.
* config/aarch64/tuning_models/qdf24xx.h: Likewise.
* config/aarch64/tuning_models/saphira.h: Likewise.
* config/aarch64/tuning_models/thunderx.h: Likewise.
* config/aarch64/tuning_models/thunderx2t99.h: Likewise.
* config/aarch64/tuning_models/thunderx3t110.h: Likewise.
* config/aarch64/tuning_models/thunderxt88.h: Likewise.
* config/aarch64/tuning_models/tsv110.h: Likewise.
* config/aarch64/tuning_models/xgene1.h: Likewise.
* config/aarch64/aarch64-sched-dispatch.cc: New file for
dispatch scheduling for aarch64.
* config/aarch64/aarch64-sched-dispatch.h: New header file.
Jennifer Schmitz [Fri, 11 Jul 2025 13:07:30 +0000 (06:07 -0700)]
AArch64: Annotate SVE instructions with new instruction attribute.
In this patch, we add the new instruction attribute "sve_type" and use it to
annotate the SVE instructions in aarch64-sve.md and aarch64-sve2.md. This
allows us to use instruction attributes to group instructions into dispatch
groups for dispatch scheduling. While there had already been fine-grained
annotation of scalar and neon instructions (mostly using the "type"-attribute),
annotation was missing for SVE instructions.
The values of the attribute "sve_type" are comparatively coarse-grained, but
fulfill the two criteria we aimed for with regard to dispatch scheduling:
- the annotation allows the definition of CPU-specific high-level attributes
mapping instructions to dispatch constraints
- the annotation is by itself CPU-independent and consistent, i.e. all
instructions fulfilling certain criteria are tagged with the corresponding
value
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md: Annotate instructions with
attribute sve_type.
* config/aarch64/aarch64-sve2.md: Likewise.
* config/aarch64/aarch64.md (sve_type): New attribute sve_type.
* config/aarch64/iterators.md (sve_type_unspec): New int attribute.
(sve_type_int): New code attribute.
(sve_type_fp): New int attribute.
libstdc++: Move test for __cpp_lib_not_fn to version.cc
When running the tests without pre-compiled headers
(--disable-libstdcxx-pch), the test fails, because the feature
testing macro (FTM) isn't defined yet.
This commit moves checking the FTM to a dedicated file (version.cc)
that's run without PCH.
libstdc++-v3/ChangeLog:
* testsuite/20_util/function_objects/not_fn/nttp.cc: Move
test of feature testing macro to version.cc
* testsuite/20_util/function_objects/not_fn/version.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Richard Biener [Wed, 24 Sep 2025 10:19:17 +0000 (12:19 +0200)]
tree-optimization/116816 - improve VMAT_ELEMENTWISE with SLP
The following implements VMAT_ELEMENTWISE for grouped loads, in
particular for being able to serve as fallback for unhandled
load permutations since it's trivial to load elements in the
correct order.
PR tree-optimization/116816
* tree-vect-stmts.cc (get_load_store_type): Allow multi-lane
single-element interleaving to fall back to VMAT_ELEMENTWISE.
Fall back to VMAT_ELEMENTWISE when we cannot handle a load
permutation.
(vectorizable_load): Do not check a load permutation
for VMAT_ELEMENTWISE. Handle grouped loads with
VMAT_ELEMENTWISE and directly apply a load permutation.
We may not classify a BB vectorization load as VMAT_ELEMENTWISE as
that will ICE. Instead we build vectors from existing scalar loads.
Make that explicit.
* tree-vect-stmts.cc (get_load_store_type): Explicitly fail
when we end up with VMAT_ELEMENTWISE for BB vectorization.
Tomasz Kamiński [Tue, 23 Sep 2025 11:56:42 +0000 (13:56 +0200)]
libstdc++: Reflect operator<< constraints in formatter for local_time.
The r16-3996-gdc78d691c5e5f7 commit (resolution of LWG4257) constrained the
operator<< for local_time, but didn't update the corresponding formatter. This
meant it didn't conditionally support formatting with an empty format spec ("{}"),
which is defined in terms of operator<<.
This patch addresses that by initializing __defSpec for the local_time
formatter in the same manner as it's done for sys_time. This functionality is
extracted to the _S_spec_for_tp function of __formatter_duration. As formatting
of local_time is defined and constrained in terms of operator<< for sys_time,
we can check the viability of the ostream operator for sys_time in both cases.
As default _M_chrono_spec may now be empty for local_time, the parse method
now checks if it was supplied in the format string, similarly to sys_time. The
condition for performing runtime check is expressed directly by checking if a
empty default is provided. This avoids the need to access the value of
__stream_insertable outside of the __defSpec computation.
As a note, despite their similar behavior, formatters sys_time and local_time
cannot be easily defined in terms of each other, as sys_time provides time zone
information while local_time does not.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_duration::_S_spec_for_tp):
Extracted from defition of formatter<sys_time>::__defSpec.
(formatter<chrono::sys_time<_Duration>, _CharT>::parse): Simplify
condition in if contexpr.
(formatter<chrono::sys_time<_Duration>, _CharT>::__stream_insertable):
Remove.
(formatter<chrono::sys_time<_Duration>, _CharT>::__defSpec)
(formatter<chrono::local_time<_Duration>, _CharT>::__defSpec):
Compute using __formatter_duration::_S_spec_for_tp.
(forrmatter<chrono::sys_time<_Duration>, _CharT>::parse): Check if
parse _M_chrono_spec
* testsuite/std/time/format/empty_spec.cc: Extend tests for floating
point and other non-streamable durations (years).
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Tue, 23 Sep 2025 09:09:08 +0000 (11:09 +0200)]
libstdc++: Use basic_format_parse_context<_CharT> in internal chrono formatters.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_parse):
Replace _ParseContext with basic_format_parse_context<_CharT> and
make it non-template.
(__formatter_duration::_M_parse): Replace _ParseContext with
basic_format_parse_context<_CharT> and remove unused default
argument.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Xi Ruoyao [Wed, 24 Sep 2025 06:26:11 +0000 (14:26 +0800)]
docs: Note that -fisolate-erroneous-paths-dereference turns division by zero into a trap [PR 122040]
And this behavior is not limited to -fdelete-null-pointer-checks.
gcc/
PR tree-optimization/122040
* doc/invoke.texi (-fisolate-erroneous-paths-dereference):
Mention it also turns division by zero into a trap in addition
to null dereference.
Xi Ruoyao [Tue, 16 Sep 2025 15:10:26 +0000 (23:10 +0800)]
LoongArch: Add isnan expander [PR 66462]
Add an expander for isnan using fclass. Since isnan is
just a compare, enable it only with -fsignaling-nans to avoid
generating spurious exceptions. This fixes part of PR66462.
int isnan1 (float x) { return __builtin_isnan (x); }
With -fno-signaling-nans:
fcmp.cun.s $fcc0,$f0,$f0
movcf2fr $f0,$fcc0
movfr2gr.s $r4,$f0
jr $r1
With -fsignaling-nans:
fclass.s $f0,$f0
movfr2gr.s $r4,$f0
andi $r4,$r4,3
sltu $r4,$r0,$r4
jr $r1
PR middle-end/66462
gcc/
* config/loongarch/loongarch.md (FCLASS_MASK): Add 3.
(fclass_optab): Assign isnan for 3.
(<FCLASS_MASK:fclass_optab><ANYF:mode>2): If FCLASS_MASK is 3,
only enable when -fsignaling-nans.
c++/modules: Fix language linkage handling [PR122019]
The ICE in the linked PR is caused because when current_lang_name is
lang_name_c, set_decl_linkage calls decl_linkage on the retrofitted
declaration. This is problematic because at this point we haven't
finished streaming the declaration, and so we crash when attempting to
access these missing bits (such as the type).
The only declarations we can reach here will be things like types that
don't get a language linkage anyway, so it seems reasonable to just
hardcode a C++ language linkage here to work around the issue.
An alternative fix would be to override current_lang_name in
lazy_load_binding instead, but this is potentially confusing if we ever
deliberately implement `extern "C" import "header_unit.hpp";` to
override the language linkage of imported declarations, so I went with
this approach instead. (Though it seems we never will do this.)
While testing this I found that we don't currently complain about
mismatching language linkages for variables, and module_may_redeclare
doesn't cope well with implementation units, so this patch also fixes
those issues.
PR c++/122019
gcc/cp/ChangeLog:
* module.cc (trees_in::install_entity): Don't be affected by
global language linkage state.
(trees_in::is_matching_decl): Check mismatching language linkage
for variables too.
(module_may_redeclare): Report the correct module name for
partitions and implementation units.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lang-4_a.C: New test.
* g++.dg/modules/lang-4_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Paul Thomas [Wed, 24 Sep 2025 07:01:23 +0000 (08:01 +0100)]
Fortran: Fix ICE in check_interface [PR87908]
2025-09-24 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87908
* interface.cc (check_interface0): If a vtable is found in the
interface list, check that it is either a subroutine or a
function. Let resolve.cc do any further checking.
gcc/testsuite/
PR fortran/87908
* gfortran.dg/pr87908.f90: New test.
Patrick Palka [Wed, 24 Sep 2025 02:41:26 +0000 (22:41 -0400)]
libstdc++/testsuite: Unpoison 'u' on s390x in names.cc test
This is the s390 counterpart to r11-7364-gd0453cf5c68b6a, and fixes the
following names.cc failure caused by a use of a poisoned identifier.
If we look at the corresponding upstream header[1] it's clear that the
problematic identifier is 'u'.
In file included from /usr/include/linux/types.h:5,
from /usr/include/linux/sched/types.h:5,
from /usr/include/bits/sched.h:61,
from /usr/include/sched.h:43,
from /usr/include/pthread.h:22,
from /usr/include/c++/14/s390x-redhat-linux/bits/gthr-default.h:35,
from /usr/include/c++/14/s390x-redhat-linux/bits/gthr.h:157,
from /usr/include/c++/14/ext/atomicity.h:35,
from /usr/include/c++/14/bits/ios_base.h:39,
from /usr/include/c++/14/streambuf:43,
from /usr/include/c++/14/bits/streambuf_iterator.h:35,
from /usr/include/c++/14/iterator:66,
from /usr/include/c++/14/s390x-redhat-linux/bits/stdc++.h:54,
from /root/rpmbuild/BUILD/gcc-14.3.1-20250617/libstdc++-v3/testsuite/17_intro/names.cc:384:
/usr/include/asm/types.h:24: error: expected unqualified-id before '[' token
/usr/include/asm/types.h:24: error: expected ')' before '[' token
/root/rpmbuild/BUILD/gcc-14.3.1-20250617/libstdc++-v3/testsuite/17_intro/names.cc:101: note: to match this '('
compiler exited with status 1
FAIL: 17_intro/names.cc -std=gnu++98 (test for excess errors)
Excess errors:
/usr/include/asm/types.h:24: error: expected unqualified-id before '[' token
/usr/include/asm/types.h:24: error: expected ')' before '[' token
Pan Li [Fri, 19 Sep 2025 06:54:48 +0000 (14:54 +0800)]
RISC-V: Add test case of unsigned scalar SAT_MUL form 5 for mul
Add test case for both the run and asm check of mul based SAT_MUL.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-6-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-6-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-6-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-6-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-6-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-6-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-6-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-6-u8-from-u64.c: New test.
Pan Li [Fri, 19 Sep 2025 06:54:47 +0000 (14:54 +0800)]
Match: Add form 5 of unsigned SAT_MUL for mul
This patch would like to try to match the the unsigned
SAT_MUL form 5, aka below:
#define DEF_SAT_U_MUL_FMT_5(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_5 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
NT hi = x >> (sizeof(NT) * 8); \
NT lo = (NT)x; \
return lo | -!!hi; \
}
while WT is uint16_t, uint32_t and uint64_t.
and T is uint8_t, uint16_t, uint32_t.
gcc/ChangeLog:
* match.pd: Add pattern of mult and reuse the widen-mul
by for keyword.
David Malcolm [Tue, 23 Sep 2025 20:38:37 +0000 (16:38 -0400)]
sarif output: add descriptions to fix-it hints (§3.55.2) [PR121986]
SARIF "fix" objects SHOULD have a "description" property (§3.55.2) that
describes the proposed fix, but currently GCC's SARIF output doesn't
support this, and we don't capture this anywhere internally as we build
fix-it hints in the compiler.
Currently we can have zero or more instances of fixit_hint associated
with a diagnostic, each representing an edit of a range of the source
code. Ideally we would have an internal API that allowed for associating
multiple fixes with a diagnostic, each with a description worded in terms
of the source language (e.g. "Fix 'colour' mispelling of field 'color'"),
and each consisting of multiple edited ranges.
For now, this patch extends the sarif output sink so that it
autogenerates descriptions of fix-it hints for simple cases of
insertion, deletion, and replacement of a single range
(e.g. "Replace 'colour' with 'color'").
gcc/ChangeLog:
PR diagnostics/121986
* diagnostics/sarif-sink.cc: Include "intl.h".
(sarif_builder::make_message_describing_fix_it_hint): New.
(sarif_builder::make_fix_object): Attempt to auto-generate a
description for fix-it hints.
gcc/testsuite/ChangeLog:
PR diagnostics/121986
* gcc.dg/sarif-output/extra-semicolon.c: New test.
* gcc.dg/sarif-output/extra-semicolon.py: New test.
* gcc.dg/sarif-output/missing-semicolon.py: Verify the description
of the insertion fix-it hint.
* libgdiagnostics.dg/test-fix-it-hint-c.py: Verify the description
of the replacement fix-it hint.
libcpp/ChangeLog:
PR diagnostics/121986
* include/rich-location.h (fixit_hint::deletion_p): New accessor.
(fixit_hint::replacement_p): New accessor.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jonathan Wakely [Mon, 8 Sep 2025 20:53:33 +0000 (21:53 +0100)]
libstdc++: Refactor std::philox_engine member functions
libstdc++-v3/ChangeLog:
* include/bits/random.h: Include <bits/ios_base.h> instead of
<iomanip>. Change preprocessor checks to use internal feature
test macro.
(philox_engine): Reword doxygen comments. Use typename instead
of class in template parameter lists. Reformat and adjust
whitespace.
(philox_engine::_If_seed_seq): Replace alias template with
__is_seed_seq variable template.
(philox_engine::philox_engine(result_type)): Define inline.
(philox_engine::seed(result_type), philox_engine::set_counter)
(philox_engine::operator(), philox_engine::discard): Likewise.
(operator==): Define as defaulted.
(operator<<): Reuse widened char.
* include/bits/random.tcc: Reformat and adjust whitespace.
(philox_engine::_M_philox): Use std::array copy constructor and
std::array::fill instead of looping.
* testsuite/26_numerics/random/philox4x32.cc: Gate test on
feature test macro. Add static_assert to check typedef.
* testsuite/26_numerics/random/philox4x64.cc: Likewise.
* testsuite/26_numerics/random/philox_engine/cons/copy.cc: Add
VERIFY assertions to check copies are equal. Test different
seeds.
* testsuite/26_numerics/random/philox_engine/cons/default.cc:
Add VERIFY assertions to check construction results.
* testsuite/26_numerics/random/philox_engine/cons/seed.cc:
Likewise.
* testsuite/26_numerics/random/philox_engine/operators/equal.cc:
Also test inequality.
* testsuite/26_numerics/random/philox_engine/operators/serialize.cc:
Remove redundant include and return.
* testsuite/26_numerics/random/philox_engine/requirements/constants.cc:
Check values of all constants.
* testsuite/26_numerics/random/philox_engine/requirements/typedefs.cc:
Check typedefs are correct.
* testsuite/26_numerics/random/philox_engine/cons/119794.cc:
Removed.
* testsuite/26_numerics/random/philox_engine/cons/seed_seq.cc:
Removed.
* testsuite/26_numerics/random/philox_engine/operators/inequal.cc:
Removed.
* testsuite/26_numerics/random/philox_engine/requirements/constexpr_data.cc:
Removed.
* testsuite/26_numerics/random/philox_engine/requirements/constexpr_functions.cc:
Removed.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.
1nfocalypse [Tue, 5 Aug 2025 01:37:18 +0000 (01:37 +0000)]
libstdc++: Implement Philox Engine (PR119794)
Conforms with errata LWG4143, LWG4153 for Philox Engine.
PR libstdc++/119794
libstdc++-v3/ChangeLog:
* include/bits/random.h (philox_engine): Define.
* include/bits/random.tcc (philox_engine): Define member
functions.
* include/bits/version.def (philox_engine): New macro.
* include/bits/version.h: Regenerated.
* include/std/random: Define __glibcxx_want_philox_engine and
include <bits/version.h>.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.
* testsuite/26_numerics/random/philox4x32.cc: New test.
* testsuite/26_numerics/random/philox4x64.cc: New test.
* testsuite/26_numerics/random/philox_engine/cons/119794.cc: New test.
* testsuite/26_numerics/random/philox_engine/cons/copy.cc: New test.
* testsuite/26_numerics/random/philox_engine/cons/default.cc: New test.
* testsuite/26_numerics/random/philox_engine/cons/seed.cc: New test.
* testsuite/26_numerics/random/philox_engine/cons/seed_seq.cc: New test.
* testsuite/26_numerics/random/philox_engine/operators/equal.cc: New test.
* testsuite/26_numerics/random/philox_engine/operators/inequal.cc: New test.
* testsuite/26_numerics/random/philox_engine/operators/serialize.cc: New test.
* testsuite/26_numerics/random/philox_engine/requirements/constants.cc: New test.
* testsuite/26_numerics/random/philox_engine/requirements/constexpr_data.cc: New test.
* testsuite/26_numerics/random/philox_engine/requirements/constexpr_functions.cc: New test.
* testsuite/26_numerics/random/philox_engine/requirements/typedefs.cc: New test.
Ben Wu [Fri, 19 Sep 2025 00:25:41 +0000 (17:25 -0700)]
libstdc++: fix element construction in std::deque::emplace [PR118087]
In order to emplace a value in the middle of a deque, a temporary was
previously constructed directly with __args... in _M_emplace_aux.
This would not work since std::deque is allocator-aware and should
construct elements with _Alloc_traits::construct instead before the
element is moved.
Using the suggestion in PR118087, we can define _Temporary_value
similar to the one used in std::vector, so the temporary can be
constructed with uses-allocator construction.
PR libstdc++/118087
libstdc++-v3/ChangeLog:
* include/bits/deque.tcc: Use _Temporary_value in
_M_emplace_aux.
* include/bits/stl_deque.h: Introduce _Temporary_value.
* testsuite/23_containers/deque/modifiers/emplace/118087.cc:
New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Ben Wu <soggysocks206@gmail.com>
arm: mve: fix out-of range literal pool for a const_vector [PR121810]
For the pattern mve_mov<mode>, the alternative that describes literal
pool support is incorrect. This leads to compiler getting the
calculations wrong for the available distance to the next pool
fragment because the selected alternative is a shorter sequence than
the correct one. In particular the sequence generated for a 128-bit
constant is
vldr.64 d0, Pool // Insn length 4, alternative 7 (part 1)
vldr.64 d1, Pool+8 // Insn length 4, alternative 7 (part 2)
Note that the second instruction needs 4 bytes more range than the
first because the PC has advanced by 4 bytes, but the next slot in the
pool has advanced by 8.
The fix is to move the 'Ui' constraint to the correct alternative
and to move the pool-range attributes to that alternative as well.
I've fixed a couple of other nits in this code at the same time:
- the thumb2_neg_pool_range attribute was misnamed (as neg_pool_range),
meaning it was ignored in Thumb state, which is the only time this
pattern is available.
- the load range was not a multiple of 4, which makes no sense for
an insn sequence that is a multiple of 4 bytes long. I've rounded the
value down out of caution, but it may well have been OK with 1020 as
the forward range.
I'm not adding a testcase for this patch; the code to reproduce is
simply too complex to reliably test for a regression.
gcc/ChangeLog:
PR target/121810
* config/arm/mve.md (mve_mov<mode>): Move the Ui constraint
and pool_range attributes to the final alternative. Fix
the forward range value and correctly name the negative
range.
Tomasz Kamiński [Tue, 23 Sep 2025 06:54:28 +0000 (08:54 +0200)]
libstdc++: Make function_ref(nontype<f>, r) CTAD SFINAE friendly [PR121940]
Instantiating the __deduce_funcref function body for function pointers
without arguments or member pointers with non-matching object types
previously led to hard errors due to the formation of invalid types.
The __deduce_funcref function is now adjusted to return void in such
cases. The corresponding function_ref deduction guide is constrained to
only match if the return type is not void, making it SFINAE friendly.
PR libstdc++/121940
libstdc++-v3/ChangeLog:
* include/bits/funcwrap.h (__polyfunc::__deduce_funcref): Return void
for ill-formed constructs.
(function_ref(nontype_t<__f>, _Tp&&)): Constrain on __deduce_funcref
producing non-void results.
* testsuite/20_util/function_ref/deduction.cc: Negative tests.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Bob Duff [Mon, 15 Sep 2025 12:26:03 +0000 (08:26 -0400)]
ada: Fix unnesting problem related to constructors
This patch fixes a bug in unnesting, which is used by the llvm back end.
Exp_Unst relies on the Scope field of nodes to detect up-level
references. Temps created by Prepend_Constructor_Procedure_Prologue
could have an incorrect Scope, causing Exp_Unst to use an up-level
reference to an activation record to implement up-level references. That
won't work; Exp_Unst is supposed to REMOVE up-level references.
This patch corrects the Scope of such temps.
gcc/ada/ChangeLog:
* exp_ch6.adb (Prepend_Constructor_Procedure_Prologue):
Push/Pop the procedure scope, so that temps created herein
get the right Scope.
Jose Ruiz [Tue, 22 Jul 2025 09:22:36 +0000 (11:22 +0200)]
ada: Improve robustness of stack usage tracking in concurrent contexts
Enabled computation of stack usage for tasks that have already
initialized their stacks with the expected fill pattern.
Ensured that stack usage snapshots for tasks are taken while
the runtime is locked, to maintain consistency.
For the environment task, approximated the stack origin using
the topmost stack known address during initialization, and take
into account that the Stack_Analyzer object is not part of its
ATCB.
gcc/ada/ChangeLog:
* libgnarl/s-stusta.adb (Report_Impl): Export a copy of
the current stack usage while holding the runtime lock.
(Report_For_Task): Do not compute stack usage for a task
that has not yet initialized its stack with the expected
pattern.
(Report_For_Task): The Stack_Analyzer object for the
environment task is not part of its ATCB. For the rest of
the tasks wait until we have initialized the stack pattern
before computing stack usage.
(Report_All_Tasks, Get_All_Tasks_Usage,
Get_Current_Task_Usage): Adapt to the new interface from
Report_Impl. Take into account that Result_Array can be
null. When we don't store stack results for a task we
need to compute it when requested.
(Print): Handle the case when we don't know the stack
usage to be reported.
* libgnat/s-stausa.adb (Initialize): For the environment
task, approximate the stack origin with the topmost
stack address that is known.
* libgnat/s-stausa.ads: Clarify comments.
Douglas B Rupp [Thu, 11 Sep 2025 17:13:35 +0000 (10:13 -0700)]
ada: Remove rtp base spec linker option pragma
Remove the linker option pragmas from vxworks7 rtp system specs,
since this is needed only with gnatmake. The same info is contained
in gprbuild, which is the only tool that can be used for building
vxworks apps.
Gary Dismukes [Wed, 10 Sep 2025 19:33:56 +0000 (19:33 +0000)]
ada: New preprocessing option for emitting empty comments for deleted lines
When integrated preprocessing is done on a source file with lines that are
filtered out, by default this can result in multiple blank lines being emitted,
but this can clash with the style switch -gnatyu, which will flag cases of
multiple blank lines. A new preprocessing option "-e" is added to allow
outputting "empty" comment lines rather than blank lines (the comments consist
of "--!" and no other text). This option is also available for the gnatprep
tool. This behavior is the default when integrated preprocessing is done
without generating a ".prep" output file with -gnateG, but when -gnateG is
used, blank lines are still substituted by default for directives and removed
source lines (for compatibility with long-existing behavior). The -gnateG
switch is also extended to allow appending 'b', 'c', or 'e' at the end of
the switch to force any of the replacement options ('b' => blank lines,
'c' => comments including the original source lintes, and 'e' => empty
comment lines).
gcc/ada/ChangeLog:
* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Add doc
for addition of -gnateG options b, c, and e.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Add doc for "-e" optionn
on gnatprep and integrated preprocessing. Add doc for addition of
-gnateG options b, c, and e.
* gprep.adb (Scan_Command_Line): Add 'e' to the list of allowed switch
letters in the string passed to GNAT.Command_Line.Getopt. Set the flag
Opt.Empty_Comment_Deleted_Lines to True when the "-e" switch is found.
(Usage): Output a switch description for the "-e" switch.
* opt.ads: Add new flag variable Empty_Comment_Deleted_Lines. Add
"GNAT" to the "GNATPREP" comment line for Comment_Deleted_Lines.
* prep.adb (Output_Line): Add handling for Empty_Comment_Deleted_Lines,
outputting comment lines consisting of "--!" for lines that are removed
in the preprocessed source file when Empty_Comment_Deleted_Lines is True.
* prepcomp.adb (Preproc_Data): Add Empty_Comments component (defaulting
to False).
(No_Preproc_Data): Add association for Empty_Comments component.
(Parse_Preprocessing_Data_File): Add handling for new switch -e.
(Prepare_To_Preprocess): Add logic for setting the new option
Empty_Comment_Deleted_Lines (and making it the default for
integrated preprocessing in the absence of other switches).
* switch-c.adb (Scan_Front_End_Switches): Add support adding a single
character 'b', 'c', 'e' to the "-gnateG" switch, to select any of the
three options for replacing deleted lines in preprocessing output file.
* usage.adb: Update usage info for -gnateG, to reflect the option of
appending b, c, or e to the switch.
* gnat_ugn.texi: Regenerate.
Javier Miranda [Tue, 2 Sep 2025 12:15:45 +0000 (12:15 +0000)]
ada: Spurious predicate check at default initialization
For an object created by an object_declaration with no explicit
initialization expression, if the type of the object is a record
type (or a private record type) with no components and a dynamic
predicate, then no predicate check must be performed at runtime
(RM 3.2.4(31/5)).
gcc/ada/ChangeLog:
* sem_util.adb (Is_Partially_Initialized_Type): Return False
for record types with no components.
Pan Li [Tue, 23 Sep 2025 01:51:14 +0000 (09:51 +0800)]
Widen-Mul: Fix typo assignment in build_and_insert_cast [PR122031]
The previous fix for PR122021 introduces a typo that the assignment
to the var itself. This PR would like to fix the typo, and sorry for
my silly mistake.
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
PR middle-end/122031
gcc/ChangeLog:
* tree-ssa-math-opts.cc (build_and_insert_cast): Fix the typo
of self assignment.
Richard Earnshaw [Mon, 22 Sep 2025 16:29:21 +0000 (17:29 +0100)]
arm: fix target-specific test duplicates for gcc
This patch fixes all the duplicates that I see when testing GCC (C
code). Some of these were real problems with the testsuite where we
were testing the wrong thing because of typos; others are due to
laxity in the tests so that we had a degree of ambiguity in the
results. I've mostly fixed the latter category by converting the
relevant test into a check-function-bodies test.
gcc/testsuite/ChangeLog:
* gcc.target/arm/acle/simd32.c (test_sadd16): Scan for sadd16.
* gcc.target/arm/armv8_2-fp16-neon-1.c (vcgtz, 128-bit): Scan for vcgt.
* gcc.target/arm/armv8_2-fp16-neon-2.c (vcgtz, 128-bit): Scan for vcgt.
(vmul, vmul N): Use check function bodies to avoid ambiguity.
* gcc.target/arm/armv8_2-fp16-scalar-1.c (vrndm): Scan for vrintm.
(vrndn): Scan for vrintn.
(vrndp): Scan for vrintp.
(vrndx): Scan for vrintx.
* gcc.target/arm/asm-flag-1.c: Scan for movlt.
* gcc.target/arm/csneg.c: Convert to check-function-bodies.
* gcc.target/arm/mve/dlstp-compile-asm-2.c (test10): Fix comment that caused test9 scan
to be run twice.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Convert to check-function-bodies.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
Alfie Richards [Thu, 13 Feb 2025 15:59:43 +0000 (15:59 +0000)]
aarch64: testsuite: Add diagnostic tests for Aarch64 FMV.
Add tests covering many FMV errors for Aarch64, including
redeclaration, and mixing target_clones and target_versions.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/mv-and-mvc-error1.C: New test.
* g++.target/aarch64/mv-and-mvc-error2.C: New test.
* g++.target/aarch64/mv-and-mvc-error3.C: New test.
* g++.target/aarch64/mv-error1.C: New test.
* g++.target/aarch64/mv-error2.C: New test.
* g++.target/aarch64/mv-error3.C: New test.
* g++.target/aarch64/mv-error4.C: New test.
* g++.target/aarch64/mv-error5.C: New test.
* g++.target/aarch64/mv-error6.C: New test.
* g++.target/aarch64/mv-error7.C: New test.
* g++.target/aarch64/mv-error8.C: New test.
* g++.target/aarch64/mvc-error1.C: New test.
* g++.target/aarch64/mvc-error2.C: New test.
* g++.target/aarch64/mvc-warning1.C: Modified test.
Alfie Richards [Wed, 6 Aug 2025 11:22:14 +0000 (11:22 +0000)]
fmv: Support mixing of target_clones and target_version.
Add support for a FMV set defined by a combination of target_clones and
target_version definitions.
Additionally, change is_function_default_version to consider a function
declaration annotated with target_clones containing default to be a
default version.
Lastly, add support for the case that a target_clone has all versions filtered
out and therefore the declaration should be removed. This is relevant as now
the default could be defined in a target_version, so a target_clones no longer
necessarily contains the default.
This takes advantage of refactoring done in previous patches changing how
target_clones are expanded and how conflicting decls are handled.
gcc/ChangeLog:
* attribs.cc (is_function_default_version): Update to handle
target_clones.
* cgraph.h (FOR_EACH_FUNCTION_REMOVABLE): New macro.
* multiple_target.cc (expand_target_clones): Update logic to delete
empty target_clones and modify diagnostic.
(ipa_target_clone): Update to use FOR_EACH_FUNCTION_REMOVABLE.
gcc/c-family/ChangeLog:
* c-attribs.cc: Add support for target_version and target_clone mixing.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/mv-and-mvc1.C: New test.
* g++.target/aarch64/mv-and-mvc2.C: New test.
* g++.target/aarch64/mv-and-mvc3.C: New test.
* g++.target/aarch64/mv-and-mvc4.C: New test.
Alfie Richards [Mon, 24 Mar 2025 13:20:01 +0000 (13:20 +0000)]
c++: Refactor FMV frontend conflict and merging logic and hooks.
This change refactors FMV handling in the frontend to allows greater
reasoning about versions in shared code.
This is needed for allowing target_clones and target_versions to be used
together in a function set, as there is then two distinct concerns when
encountering two declarations that previously were conflated:
1. Are these two declarations completely disjoint FMV declarations
(ie. the sets of versions they define have no overlap). If so, they don't
conflict so there is no need to merge and both can be pushed.
2. For two declarations that aren't completely disjoint, are they matching
and therefore mergeable. (ie. two target_clone decls that define the same set
of versions, or an un-annotated declaration, and a target_clones definition
containing the default version). If so, continue to the existing merging logic
to try to merge these and diagnose if it's not possible.
If not, then diagnose the conflicting declarations.
To do this the common_function_versions function has been renamed
disjoint_function_versions (meaning, are the version sets defined by these
two decl's completely distinct from each other).
A new hook called same_function_version is introduces taking two
string_slice's (each representing a single version) and determining if they
define the same version.
A new function, called diagnose_versioned_decls is added, which checks
if two decls (with overlapping version sets) can be merged and diagnose when
they cannot be (only in terms of the attributes, the existing logic is used to
detect other mergeability conflicts like redefinition).
This only effects targets with TARGET_HAS_FMV_TARGET_ATTRIBUTE set to false.
(ie. aarch64 and riscv), the existing logic for i86 and ppc is unchanged.
This also means the same function version hook is only used for aarch64 and
riscv.
gcc/ChangeLog:
* attribs.h (common_function_versions): Removed.
* attribs.cc (common_function_versions): Removed.
* config/aarch64/aarch64.cc (aarch64_common_function_versions): Removed.
(aarch64_same_function_versions): New function to check if two version
strings imply the same version.
(TARGET_OPTION_FUNCTION_VERSIONS): Removed.
(TARGET_OPTION_SAME_FUNCTION_VERSIONS): New macro.
* config/i386/i386.cc (TARGET_OPTION_FUNCTION_VERSIONS): Removed.
* config/rs6000/rs6000.cc (TARGET_OPTION_FUNCTION_VERSIONS): Removed.
* config/riscv/riscv.cc (riscv_same_function_versions): New function
to check if two version strings imply the same version.
(riscv_common_function_versions): Removed.
(TARGET_OPTION_FUNCTION_VERSIONS): Removed.
(TARGET_OPTION_SAME_FUNCTION_VERSIONS): New macro.
* doc/tm.texi: Regenerated.
* target.def: Remove common_version hook and add same_function_version
hook.
* doc/tm.texi.in: Ditto.
* tree.cc (distinct_version_decls): New function.
(mergeable_version_decls): Ditto.
* tree.h (distinct_version_decls): New function.
(mergeable_version_decls): Ditto.
* hooks.h (hook_stringslice_stringslice_unreachable): New function.
* hooks.cc (hook_stringslice_stringslice_unreachable): New function.
gcc/cp/ChangeLog:
* class.cc (resolve_address_of_overloaded_function): Updated to use
dijoint_versions_decls instead of common_function_version hook.
* decl.cc (decls_match): Refacture to use disjoint_version_decls and
to pass through conflicting_version argument.
(maybe_version_functions): Updated to use
disjoint_version_decls instead of common_function_version hook.
(duplicate_decls): Add logic to handle conflicting unmergable decls
and improve diagnostics for conflicting versions.
* decl2.cc (check_classfn): Updated to use
disjoint_version_decls instead of common_function_version hook.
Alfie Richards [Wed, 28 May 2025 15:42:07 +0000 (15:42 +0000)]
c: c++: Add target_[version/clones] to decl diagnostics formatting.
Adds the target_version and target_clones attributes to diagnostic messages
for target_version semantics.
This is because the target_version/target_clones attributes affect the identity
of the decls, so need to be represented in diagnostics for them.
This also requires making maybe_print_whitespace available to c++ code so
we can control if whitespace is needed cosistantly between c and c++
diagnostics.
After this change diagnostics look like:
c:
```
test.c:6:8: error: redefinition of ‘foo [[target_version("sve")]]’
6 | float foo () {return 1;}
| ^~~
test.c:3:8: note: previous definition of ‘foo [[target_version("sve")]]’ with type ‘float(void)’
3 | float foo () {return 2;}
| ^~~
test.c:12:8: error: redefinition of ‘bar [[target_clones("sve")]]’
12 | float bar () {return 1;}
| ^~~
test.c:9:8: note: previous definition of ‘bar [[target_clones("sve")]]’ with type ‘float(void)’
9 | float bar () {return 2;}
| ^~~
```
c++:
```
test.cpp:6:8: error: redefinition of ‘float foo [[target_version("sve")]] ()’
6 | float foo () {return 1;}
| ^~~
test.cpp:3:8: note: ‘float foo [[target_version("sve")]] ()’ previously defined here
3 | float foo () {return 2;}
| ^~~
test.cpp:12:8: error: redefinition of ‘float bar [[target_clones("sve")]] ()’
12 | float bar () {return 1;}
| ^~~
test.cpp:9:8: note: ‘float bar [[target_clones("sve")]] ()’ previously defined here
9 | float bar () {return 2;}
| ^~~
```
This only affects targets which use target_version (aarch64 and riscv).
gcc/c-family/ChangeLog:
* c-pretty-print.cc (pp_c_function_target_version): New function.
(pp_c_function_target_clones): New function.
(pp_c_maybe_whitespace): Move to c-pretty-print.h.
* c-pretty-print.h (pp_c_function_target_version): New function.
(pp_c_function_target_clones): New function.
(pp_c_maybe_whitespace): Moved here from c-pretty-print.cc.
gcc/c/ChangeLog:
* c-objc-common.cc (c_tree_printer): Add printing of target_clone and
target_version in decl diagnostics.
gcc/cp/ChangeLog:
* cxx-pretty-print.h (pp_cxx_function_target_version): New macro.
(pp_cxx_function_target_clones): Ditto.
(pp_cxx_maybe_whitespace): Ditto.
* error.cc (dump_function_decl): Add printing of target_clone and
target_version in decl diagnostics.
Alfie Richards [Thu, 13 Feb 2025 15:30:45 +0000 (15:30 +0000)]
fmv: c++: Change target_version semantics to follow ACLE specification.
This patch changes the semantics of target_version and target_clones attributes
to match the behavior described in the Arm C Language extension.
The changes to behavior are:
- The scope and signature of an FMV function set is now that of the default
version.
- The FMV resolver is now created at the locations of the default version
implementation. Previously this was at the first call to an FMV function.
- When a TU has a single annotated function version, it gets mangled.
- This includes a lone annotated default version.
This only affects targets with TARRGET_HAS_FMV_TARGET_ATTRIBUTE set to false.
Currently that is aarch64 and riscv.
This is achieved by:
- Skipping the existing FMV dispatching code at C++ gimplification and instead
making use of the target_clones dispatching code in multiple_targets.cc.
(This fixes PR target/118313 for aarch64 and riscv).
- Splitting target_clones pass in two, an early and late pass, where the early
pass handles cases where multiple declarations are used to define a version,
and the late pass handling target semantics targets, and cases where a FMV
set is defined by a single target_clones decl.
- Changing the logic in add_candidates and resolve_address of overloaded
function to prevent resolution of any version except a default version.
(thus making the default version determine scope and signature of the
versioned function set).
- Adding logic for dispatching a lone annotated default version in
multiple_targets.cc
- As as annotated default version gets mangled an alias is created from the
dispatched symbol to the default version as no ifunc resolution is required
in this case. (ie. an alias from `_Z3foov` to `_Z3foov.default`)
- Adding logic to `symbol_table::remove_unreachable_nodes` and analyze_functions
that a reference to the default function version also implies a possible
reference to the other versions (so they shouldnt be deleted and do need to
be analyzed).
gcc/ChangeLog:
PR target/118313
* cgraph.cc (delete_function_version): Made public static member of
cgraph_node.
* cgraph.h (delete_function_version): Ditto.
* cgraphunit.cc (analyze_functions): Add logic for target version
dependencies.
* ipa.cc (symbol_table::remove_unreachable_nodes): Ditto.
* multiple_target.cc (create_dispatcher_calls): Change to support
target version semantics.
(ipa_target_clone): Change to dispatch all function sets in
target_version semantics, and to have early and late pass.
(expand_target_clones): Add logic for cases of target_clones with no
defaults.
(is_simple_target_clones_case): New function.
(class pass_target_clone): New parameter for early or late pass.
* config/aarch64/aarch64.cc: (aarch64_get_function_versions_dispatcher):
Refactor with the assumption that the DECL node will be default.
* config/riscv/riscv.cc: (riscv_get_function_versions_dispatcher):
Refactor with the assumption that the DECL node will be default.
* passes.def: Split target_clones pass into early and late version.
gcc/cp/ChangeLog:
PR target/118313
* call.cc (add_candidates): Change to not resolve non-default versions
in target_version semantics.
* class.cc (resolve_address_of_overloaded_function): Ditto.
* cp-gimplify.cc (cp_genericize_r): Change logic to not apply for
target_version semantics.
* decl.cc (maybe_mark_function_versioned): Remove static.
* cp-tree.h (maybe_mark_function_versioned): New function.
* decl2.cc (cplus_decl_attributes ): Change to mark and therefore
mangle all target_version decls in target_version semantics.
* typeck.cc (cp_build_function_call_vec): Add error for calling
unresolvable non-default node in target_version semantics.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/mv-1.C: Change for target_version semantics.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/riscv/mv-symbols2.C: Ditto.
* g++.target/riscv/mv-symbols3.C: Ditto.
* g++.target/riscv/mv-symbols4.C: Ditto.
* g++.target/riscv/mv-symbols5.C: Ditto.
* g++.target/riscv/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols10.C: New test.
* g++.target/aarch64/mv-symbols11.C: New test.
* g++.target/aarch64/mv-symbols12.C: New test.
* g++.target/aarch64/mv-symbols13.C: New test.
* g++.target/aarch64/mv-symbols6.C: New test.
* g++.target/aarch64/mv-symbols7.C: New test.
* g++.target/aarch64/mv-symbols8.C: New test.
* g++.target/aarch64/mv-symbols9.C: New test.
Alfie Richards [Mon, 24 Mar 2025 15:04:38 +0000 (15:04 +0000)]
fmv: c++: Add check_target_clone hook for filtering target_clone versions.
This patch introduces the TARGET_CHECK_TARGET_CLONE_VERSION hook
which is used to determine if a target_clones version string parses.
The hook has a flag to enable emitting diagnostics.
This is as specified in the Arm C Language Extension. The purpose of this
is to be able to ignore invalid versions to allow some portability of code
using target_clones attributes.
Currently this is only properly implemented for the Aarch64 backend.
For riscv which is the only other backend which uses target_version
semantics a partial implementation is present, where this hook is used
to check parsing, in which errors will be emitted on a failed parse
rather than warnings. A refactor of the riscv parsing logic would be
required to enable this functionality fully.
This fixes PR 118339 where parse failures could cause ICE in Aarch64.
gcc/ChangeLog:
PR target/118339
* target.def: Add check_target_clone_version hook.
* tree.cc (get_clone_attr_versions): Add filter argument.
(get_clone_versions): Add filter argument.
* tree.h (get_clone_attr_versions): Add filter.
(get_clone_versions): Add filter argument.
* config/aarch64/aarch64.cc (aarch64_check_target_clone_version):
New function
(TARGET_CHECK_TARGET_CLONE_VERSION): New define.
* config/riscv/riscv.cc (riscv_check_target_clone_version):
New function.
(TARGET_CHECK_TARGET_CLONE_VERSION): New define.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add documentation for new hook.
* hooks.h (hook_stringslice_locationtptr_true): New function.
* hooks.cc (hook_stringslice_locationtptr_true): New function.
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_target_clones_attribute): Update to use new hook.
Alfie Richards [Mon, 24 Mar 2025 11:45:32 +0000 (11:45 +0000)]
riscv: Refactor riscv target parsing to take string_slice.
This is a quick refactor of the riscv target processing code
to take a string_slice rather than a decl.
The reason for this is to enable it to work with target_clones
where merging logic requires reasoning about each version string
individually in the front end.
This refactor primarily serves just to get this working. Ideally the
logic here would be further refactored as currently there is no way to
check if a parse fails or not without emitting an error.
This makes things difficult for later patches which intends to emit a
warning and ignoring unrecognised/not parsed target_clone values rather
than erroring which can't currently be achieved with the current riscv
code.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_process_target_version_str): New function..
* config/riscv/riscv-target-attr.cc (riscv_process_target_attr): Refactor to take
string_slice.
(riscv_process_target_version_str): New function.
* config/riscv/riscv.cc (parse_features_for_version): Refactor to take
string_slice.
(riscv_compare_version_priority): Ditto.
(dispatch_function_versions): Change to pass location.
Alfie Richards [Wed, 12 Feb 2025 14:13:02 +0000 (14:13 +0000)]
x86: fmv: Refactor FMV name mangling.
This patch is an overhaul of how FMV name mangling works. Previously
mangling logic was duplicated in several places across both target
specific and independent code. This patch changes this such that all
mangling is done in targetm.mangle_decl_assembler_name (including for the
dispatched symbol and dispatcher resolver).
Adds the assembler_name member to cgraph_function_version_info to store
the base assembler name of the function set, before FMV mangling.
This allows for the removing of previous hacks, such as where the default
mangled decl's assembler name was unmangled to then remangle all versions
and the resolver and dispatched symbol.
This introduces a change (shown in test changes) for the assembler name of the
dispatched symbol for a x86 versioned function set. Previously it used the
function name mangled twice. This was hard to reproduce without hacks I
wasn't comfortable with. Therefore, the mangling is changed to instead append
".ifunc" which matches clang's behavior.
This change also refactors expand_target_clone using
targetm.mangle_decl_assembler_name for mangling and get_clone_versions.
It is modified such that if the target_clone is in a FMV structure
the ordering is preserved once expanded. This is used later for ACLE semantics
and target_clone/target_version mixing.
gcc/ChangeLog:
* attribs.cc (make_dispatcher_decl): Move duplicated cgraph logic into
this function and change to use targetm.mangle_decl_assembler_name for
mangling.
* cgraph.cc (cgraph_node::insert_new_function_version): Record
assembler_name.
* cgraph.h (struct cgraph_function_version_info): Add assembler_name.
(struct cgraph_node): Add dispatcher_resolver_function and
is_target_clone.
* config/aarch64/aarch64.cc (aarch64_parse_fmv_features): Change to
support string_slice.
(aarch64_process_target_version_attr): Ditto.
(get_feature_mask_for_version): Ditto.
(aarch64_mangle_decl_assembler_name): Add logic for mangling dispatched
symbol and resolver.
(get_suffixed_assembler_name): Removed.
(make_resolver_func): Refactor to use
aarch64_mangle_decl_assembler_name for mangling.
(aarch64_generate_version_dispatcher_body): Remove remangling.
(aarch64_get_function_versions_dispatcher): Refactor to remove
duplicated cgraph logic.
* config/i386/i386-features.cc
(ix86_mangle_function_version_assembler_name): Refactor to use
clone_identifier and to mangle default.
(ix86_mangle_decl_assembler_name): Add logic for mangling dispatched
symbol and resolver.
(ix86_get_function_versions_dispatcher): Remove duplicated cgraph
logic.
(make_resolver_func): Refactor to use ix86_mangle_decl_assembler_name
for mangling.
* config/riscv/riscv.cc (riscv_mangle_decl_assembler_name): Add logic
for FMV mangling.
(get_suffixed_assembler_name): Removed.
(make_resolver_func): Refactor to use riscv_mangle_decl_assembler_name
for mangling.
(riscv_generate_version_dispatcher_body): Remove unnecessary remangling.
(riscv_get_function_versions_dispatcher): Remove duplicated cgraph
logic.
* config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): New
function.
(rs6000_get_function_versions_dispatcher): Remove duplicated cgraph
logic.
(make_resolver_func): Refactor to use rs6000_mangle_decl_assembler_name
for mangling.
(rs6000_mangle_function_version_assembler_name): New function.
* multiple_target.cc (create_dispatcher_calls): Remove mangling code.
(get_attr_str): Removed.
(separate_attrs): Ditto.
(is_valid_asm_symbol): Removed.
(create_new_asm_name): Ditto.
(expand_target_clones): Refactor to use
targetm.mangle_decl_assembler_name for mangling and be more general.
* tree.cc (get_target_clone_attr_len): Removed.
* tree.h (get_target_clone_attr_len): Removed.
gcc/cp/ChangeLog:
* decl.cc (maybe_mark_function_versioned): Change to insert function version
and therefore record assembler name.
Alfie Richards [Fri, 31 Jan 2025 11:47:57 +0000 (11:47 +0000)]
cgraph: Add clone_identifier function.
This is similar to clone_function_name and its siblings but takes an
identifier tree node rather than a function declaration.
This is to be used in conjunction with the identifier node stored in
cgraph_function_version_info::assembler_name to mangle FMV functions in
later patches.
gcc/ChangeLog:
* cgraph.h (clone_identifier): New function.
* cgraphclones.cc (clone_identifier): New function.
(clone_function_name): Refactored to use clone_identifier.
(is_valid_asm_symbol): New helper function.
Jonathan Wakely [Fri, 19 Sep 2025 11:11:26 +0000 (12:11 +0100)]
libstdc++: Remove unwanted PACKAGE macros from c++config.h [PR79147]
Autoconf insists on adding macros like PACKAGE_NAME and
PACKAGE_BUG_TARNAME to config.h but those are useless for libstdc++
because it's not a complete package, just a sub-directory of gcc, and we
never use any of those strings in our sources.
Since we include the generated config.h in our installed c++config.h
header, those useless macros are exposed to users. We do transform them
to use the reserved _GLIBCXX_ prefix, but they're still just useless
noise in the installed header.
I don't know any way to get autoconf to not add them to config.h but
this change comments them out so they're not defined when users include
our headers.
Although not really important now that the macro isn't being defined,
this change also avoids the double substitution for PACKAGE_VERSION
which was resulting in _GLIBCXX_PACKAGE__GLIBCXX_VERSION.
libstdc++-v3/ChangeLog:
PR libstdc++/79147
* include/Makefile.am (c++config.h): Adjust sed command to
comment out all PACKAGE_XXX macros and to avoid adjusting
PACKAGE_VERSION twice.
* include/Makefile.in: Regenerate.
Tomasz Kamiński [Tue, 23 Sep 2025 05:51:18 +0000 (07:51 +0200)]
libstdc++: Remove leftover __formatter_chrono base classes.
This patch removes the __formatter_chrono<_CharT> base class from the
formatters for utc_time, gps_time, and tai_time. These formatters
are using the __formatter_duration<_CharT> member only.
Since it regressed SPEC performance(Refer to PR121994), I guess
it's related to register pressure and can be tuned by adjusting
reduc_lat_mult_thr. I don't have Zen2 machine, so for simplity, I'll
just disable unroll in vectorizer for Zen2.
Also adjust count number for {AVX256,AVX512}_SPLIT_REGS.
gcc/ChangeLog:
PR target/121994
* config/i386/x86-tune-costs.h (znver2_cost): Set
vect_unroll_limit to 1.
(znver1_cost): Ditto.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Adjust count number for {AVX256,AVX512}_SPLIT_REGS.
Peter Bergner [Mon, 22 Sep 2025 17:17:26 +0000 (12:17 -0500)]
RISC-V: Add missing define_insn_reservation to tt-ascalon-d8.md [PR121982]
The tt-ascalon-d8's pipeline description has reservations for 16-bit, 32-bit
and 64-bit vector integer divides, but was missing a reservation for 8-bit
vector integer divides, leading to an ICE. Add the missing reservation.
2025-09-22 Peter Bergner <bergner@tenstorrent.com>
gcc/
PR target/121982
* config/riscv/tt-ascalon-d8.md (tt_ascalon_d8_vec_idiv_byte): New
define_insn_reservation.
gcc/testsuite/
PR target/121982
* gcc.target/riscv/pr121982.c: New test.
Signed-off-by: Peter Bergner <bergner@tenstorrent.com>
c++: Fix canonical type for lambda pack captures [PR122015]
comp_template_parms_position uses whether a TEMPLATE_TYPE_PARM is a pack
to determine equivalency. This in turn affects whether
canonical_type_parameter finds a pre-existing auto type as equivalent.
When generating the 'auto...' type for a lambda pack capture, we only
mark it as a pack after generating the node (and calculating its
canonical); this means that later when comparing a version streamed in
from a module we think that two equivalent types have different
TYPE_CANONICAL, because the latter already had
TEMPLATE_PARM_PARAMETER_PACK set before calculating its canonical.
This patch fixes this by using a new 'make_auto_pack' function to ensure
that packness is set before the canonical is looked up.
PR c++/122015
gcc/cp/ChangeLog:
* cp-tree.h (make_auto_pack): Declare.
* lambda.cc (lambda_capture_field_type): Use make_auto_pack to
ensure TYPE_CANONICAL is set correctly.
* pt.cc (make_auto_pack): New function.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-11.h: New test.
* g++.dg/modules/lambda-11_a.H: New test.
* g++.dg/modules/lambda-11_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Jonathan Wakely [Fri, 19 Sep 2025 16:28:51 +0000 (17:28 +0100)]
top-level: Add .editorconfig file
This config file sets default formatting behaviour for a large number
of common editors, see https://editorconfig.org
It also ensures that https://forge.sourceware.org formats GCC code
correctly, because it defaults to tab_width=4 but will respect a
.editorconfig file if present in the repo.
Andrew Pinski [Fri, 19 Sep 2025 21:37:04 +0000 (14:37 -0700)]
fab/gimple-fold: Move __builtin_constant_p folding to gimple-fold [PR121762]
This is the first patch in removing fold_all_builtins pass.
We want to fold __builtin_constant_p into 0 if we know the argument can't be
a constant. So currently that is done in fab pass (though ranger handles it now too).
Instead of having fab do it we can check PROP_last_full_fold if set and set it
to 0 instead.
Note for -Og, fab was the only place which did this conversion, so we need to
set PROP_last_full_fold for it; later on fab will be removed and isel will do
it instead but that is for another day.
Also instead of going through fold_call_stmt to call fold_builtin_constant_p,
fold_builtin_constant_p is called directly from gimple_fold_builtin_constant_p.
This should speed up the compiling slight :).
Note fab was originally added to do this transformation during the development
of the ssa branch.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121762
gcc/ChangeLog:
* builtins.cc (fold_builtin_constant_p): Make non-static.
* builtins.h (fold_builtin_constant_p): New declaration.
* gimple-fold.cc (gimple_fold_builtin_constant_p): New function.
(gimple_fold_builtin): Call gimple_fold_builtin_constant_p
for BUILT_IN_CONSTANT_P.
* tree-ssa-ccp.cc (pass_fold_builtins::execute): Set PROP_last_full_fold
on curr_properties. Remove handling of BUILT_IN_CONSTANT_P.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Tomasz Kamiński [Thu, 11 Sep 2025 09:11:06 +0000 (11:11 +0200)]
libstdc++: Rework handling of ISO week calendar and week index formatting.
The handling of ISO week-calendar year specifiers (%G, %g) and ISO week
number (%V) was merged into a single _M_g_G_V function, as the latter
requires ISO year value, computed by the former.
The values for %U and %W, which are based on the number of days since the
first Sunday and Monday of the year respectively, are now expressed as an
offset from the existing _M_day_of_year field. This reduces redundant
computation. The required flags were also updated to only need _DayOfYear
and _Weekday.
The _M_g_G_V function uses _M_day_of_year to compute __idoy, the day of the
year for the nearest Thursday. This value is used to determine if the ISO
year is the previous year (__idoy <= 0), the current year (__idoy <= 365/366),
next year (__idoy <= 730), or later year. This avoids an expensive conversion
from local_days to year_month_day in most cases. If the ISO calendar year
is current year, the __idoy value is reused for weekday index computation.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h(__formatter_chrono::_M_parse): Update
needed flags for %g, %G, %V, %U, %W.
(__formatter_chrono::_M_format_to): Change how %V is handled.
(__formatter_chrono::_M_g_G): Merged into _M_g_G_V.
(__formatter_chrono::_M_g_G_V): Reworked from _M_g_G.
(__formatter_chrono::_M_U_V_W): Changed into _M_U_V.
(__formatter_chrono::_M_U_W): Reworked implementation.
* testsuite/std/time/year_month_day/io.cc: New tests.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Mon, 22 Sep 2025 11:31:17 +0000 (13:31 +0200)]
libstdc++: Move start_lifetime_as functions to bits/stl_construct.h [PR106658]
This allows inplace_vector to use these functions without including the entire
<memory> header.
Preprocessor checks are changed to use __glibcxx macros, so new functions are
available outside memory header, that exports __cpp_lib macros.
PR libstdc++/106658
libstdc++-v3/ChangeLog:
* include/bits/stl_construct.h (std::start_lifetime_as_array)
(std::start_lifetime_as): Moved from std/memory, with update
to guard macros.
* include/std/memory (std::start_lifetime_as_array)
(std::start_lifetime_as): Moved to bits/stl_construct.h.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
To:
int32_t _5 = (int32_t)_3; // zero-extend 16 => 32
That will have a problem for sign-extend, the highest bits may be all 1s
but will be loss after convert to zero-extend. Thus, there will be more
cases if the convert has different types. Case 1 as above and Case 2,
3, and 4 as following.
Then, we can see, there will be mis-compile if and only if there is
a cast from small to big size with sign extend. Thus, restrict the
check and stop prop if there is sign extend cast.
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
PR middle-end/122021
gcc/ChangeLog:
* tree-ssa-math-opts.cc (build_and_insert_cast): Add sign-extend
check before prop.
Richard Biener [Mon, 22 Sep 2025 08:14:31 +0000 (10:14 +0200)]
tree-optimization/122016 - PRE insertion breaks abnormal coalescing
When PRE asks VN to simplify a NARY but not insert, that bypasses
the abnormal guard in maybe_push_res_to_seq and we blindly accept
new uses of abnormals. The following fixes this.
PR tree-optimization/122016
* tree-ssa-sccvn.cc (vn_nary_simplify): Do not use the
simplified expression when it references abnormals.
Eric Botcazou [Mon, 22 Sep 2025 09:08:34 +0000 (11:08 +0200)]
Ada: Fix internal error on use clause present in generic formal part
This is a regression present on the mainline and 15 branch: the compiler
aborts on a use clause present in the formal part of a generic unit because
of an oversight in the new inference code for generic actual parameters.
The fix also adds a missing test to Analyze_Dimension_Array_Aggregate.
gcc/ada/
PR ada/121968
* sem_ch12.adb (Associations.Find_Assoc): Add guard for clauses.
* sem_dim.adb (Analyze_Dimension_Array_Aggregate): Add test for
N_Iterated_Component_Association nodes.
Dimitar Dimitrov [Sat, 19 Oct 2024 20:40:35 +0000 (23:40 +0300)]
pru: Reject bit-fields for TI ABI
TI ABI has non-conventional requirements for bit-fields, which cannot
be implemented with the current target hooks in GCC.
Target hooks are focused on packing and alignment. But PRU uses packed
structs by default, and has 1 byte alignment for all types. As an
example, this makes it difficult to implement the TI ABI requirement
for the following struct to be sized to 4 bytes, per the bit-field type:
struct S { int i : 1; }
Instead of introducing new target hooks and making risky changes to
common GCC code, simply declare bit-fields as not supported in TI ABI
mode.
PRU is a baremetal target. It has neither support for interrupts nor an
RTOS. Hence ABI compatibility is not that critical. I have not seen
any projects which rely on ABI compatibility in order to mix object
files from GCC and the TI proprietary compiler.
The target-specific pass to scan for TI ABI compatibility was rewritten
as an IPA pass. This allowed scanning not only of function bodies, but
also global variable declarations. Diagnostic locations should now be
more accurate. Thus some test cases had to be adjusted.
PR target/116205
gcc/ChangeLog:
* config/pru/pru-passes.cc (class pass_pru_tiabi_check): Make
this an IPA pass.
(chkp_type_has_function_pointer): Remove.
(check_type_tiabi_compatibility): New function.
(chk_function_decl): Rename.
(check_function_decl): Simplify.
(check_op_callback): Rework to use
check_type_tiabi_compatibility.
(pass_pru_tiabi_check::execute): Rework to scan all symbols and
gimple contents of all defined functions.
* config/pru/pru-passes.def (INSERT_PASS_AFTER): Move after
pass_ipa_auto_profile_offline.
* config/pru/pru-protos.h (make_pru_tiabi_check): New
declaration to mark as IPA pass.
(make_pru_minrt_check): Specify it is making a gimple pass.
* doc/invoke.texi: Document that bit-fields are now rejected for
TI ABI.
gcc/testsuite/ChangeLog:
* gcc.target/pru/mabi-ti-1.c: Adjust diagnostic location.
* gcc.target/pru/mabi-ti-2.c: Ditto.
* gcc.target/pru/mabi-ti-3.c: Ditto.
* gcc.target/pru/mabi-ti-5.c: Ditto.
* gcc.target/pru/mabi-ti-6.c: Ditto.
* gcc.target/pru/mabi-ti-7.c: Adjust diagnostic locations and
add global variables for checking.
* gcc.target/pru/mabi-ti-11.c: New test.
* gcc.target/pru/mabi-ti-12.c: New test.
* gcc.target/pru/mabi-ti-8.c: New test.
* gcc.target/pru/mabi-ti-9.c: New test.
Andrew Pinski [Fri, 19 Sep 2025 19:23:57 +0000 (12:23 -0700)]
fab: Remove forced label check from optimize_unreachable
Since optimize_unreachable does not directly remove the bb, we can still remove
the condition that goes to a block containing a forced label. This is a small cleanup
from the original patch which added optimize_unreachable.
The review of the original patch missed that the bb was not being removed by the pass
but later on by cleanupcfg; https://gcc.gnu.org/pipermail/gcc-patches/2012-July/343239.html.
Which is why this is allowed to be done.
I added another testcase to check that the `if` is removed too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-ccp.cc (optimize_unreachable): Don't check for forced labels.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-unreachable-7.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jan Hubicka [Sun, 21 Sep 2025 17:54:51 +0000 (19:54 +0200)]
Update calls_comdat_local in cgraph_node::create_version_clone
This patches fixes ICE when ipa-split is run from ipa-profile. In normal
computation we recompute the flag elsewhere, but it is supposed to be kept
up-to-date by passes possibly modifying it.
Jan Hubicka [Sun, 21 Sep 2025 17:51:57 +0000 (19:51 +0200)]
One extra special case for AFDO0
This patch makes inliner and ipa-cp to consider optimization interesting even
in scenarios where aufdo countis 0, but scaleis high enough to make optimization
worthwhile.
gcc/ChangeLog:
* cgraph.cc (cgraph_edge::maybe_hot_p): For AFDO profiles force
count to be non-zero.