Pan Li [Mon, 7 Jul 2025 03:07:11 +0000 (11:07 +0800)]
RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vssub.vv to the
vssub.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_SAT_S_ADD(T, UT, MIN, MAX) \
T \
test_##T##_sat_add (T x, T y) \
{ \
T sum = (UT)x + (UT)y; \
return (x ^ y) < 0 \
? sum \
: (sum ^ x) >= 0 \
? sum \
: x < 0 ? MIN : MAX; \
}
* config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add
new case SS_MINUS.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op ss_minus.
[PATCH] RISC-V: Enable zvfh for vector-scalar half-float run tests
zvfh is not enabled at the testsuite level. It has to be enabled on a testcase
by testcase basis. This was correctly done for compile tests but not for run
tests. This patch fixes it.
Also, to ensure correct results with half-precision floats, MAX_RELATIVE_DIFF is
set according to the type.
The patch changes order of inclusions, i.e. elfos.h is included before
target specific h8300/h8300.h, in a way similar to a few other targets.
Thanks to this change it is possible to override macros from elfos.h in
h8300/h8300.h, in particular .init/.fini section definitions.
PR target/109286
gcc/ChangeLog:
* config.gcc: Include elfos.h before h8300/h8300.h.
* config/h8300/h8300.h (INIT_SECTION_ASM_OP): Override
default version from elfos.h.
(FINI_SECTION_ASM_OP): Ditto.
(ASM_DECLARE_FUNCTION_NAME): Ditto.
(ASM_GENERATE_INTERNAL_LABEL): Macro removed because it was
being overridden in elfos.h anyway.
(ASM_OUTPUT_SKIP): Ditto.
gimple-fold: extend vector simplification to match scalar bitwise optimizations [PR119196]
Generalize existing scalar gimple_fold rules to apply the same
bitwise comparison simplifications to vector types. Previously, an
expression like
(x < y) && (x > y)
would fold to `false` if x and y are scalars, but equivalent vector
comparisons were left untouched. This patch enables folding of
patterns of the form
(cmp x y) bit_and (cmp x y)
(cmp x y) bit_ior (cmp x y)
(cmp x y) bit_xor (cmp x y)
for vector operands as well, ensuring consistent optimization across
all data types.
gcc/ChangeLog:
PR tree-optimization/119196
* match.pd: Allow scalar optimizations with bitwise AND/OR/XOR to apply to vectors.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vector-compare-5.c: Add new test for vector compare simplification.
tree-simplify: unify simple_comparison ops in vec_cond for bit and/or/xor [PR119196]
Merge simple_comparison patterns under a single vec_cond_expr for bit_and,
bit_ior, and bit_xor in the simplify pass.
Ensure that when both operands of a bit_and, bit_or, or bit_xor are simple_comparison
results, they reside within the same vec_cond_expr rather than separate ones.
This prepares the AST so that subsequent transformations (e.g., folding the
comparisons if possible) can take effect.
gcc/ChangeLog:
PR tree-optimization/119196
* match.pd: Merge multiple vec_cond_expr in a single one for
bit_and, bit_ior and bit_xor.
Jeff Law [Wed, 9 Jul 2025 11:23:34 +0000 (05:23 -0600)]
[RISC-V][PR target/120642] Avoid propagating constant AVL for theadvector
AVL propagation currently assumes that it can propagate a constant AVL into any
vector insn and trips an assert if the insn fails to recognize after such a
propagation.
However, for xtheadvector that is not a correct assumption; xtheadvector does
not allow the vector length to be a constant integer (other than zero which
allowed via x0).
After consulting with Jin Ma (thanks!) we agree the right fix is to avoid
creating the immediate AVL for xtheadvector.
This has been tested in my tester, just waiting for the pre-commit tester to
spin it.
PR target/120642
gcc/
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Do not do
constant AVL propagation for xtheadvector.
gcc/testsuite/
* gcc.target/riscv/rvv/xtheadvector/pr120642.c: New test.
libstdc++: Added missing members to numeric_limits specializations for integer-class types
[iterator.concept.winc]/11 says that std::numeric_limits should be
specialized for integer-class types, with each member defined
appropriately.
libstdc++-v3/ChangeLog:
* include/bits/max_size_type.h (numeric_limits<__max_size_type>):
New members.
(numeric_limits<__max_diff_type>): Likewise.
* testsuite/std/ranges/iota/max_size_type.cc: New test cases.
Richard Biener [Wed, 9 Jul 2025 09:23:30 +0000 (11:23 +0200)]
Avoid accessing STMT_VINFO_VECTYPE
The following fixes up two places we access STMT_VINFO_VECTYPE that's
not covered by the fixup in vect_analyze/transform_stmt to set that
from SLP_TREE_VECTYPE.
* tree-vect-loop.cc (vectorizable_reduction): Get the
output vector type from slp_for_stmt_info.
* tree-vect-stmts.cc (vect_analyze_stmt): Bail out earlier
for PURE_SLP_STMT when doing loop stmt analysis.
Vectorization of int patterns requires 64bit long type (at least the
way the tests are coded). Fix this to only test for successful
vectoriation on 64bit targets.
* gcc.target/s390/vector/pattern-avg-1.c: Fix on -m31.
* gcc.target/s390/vector/pattern-mulh-1.c: Fix on -m31.
* gcc.target/s390/vector/pattern-mulh-2.c: Fix on -m31.
Jan Hubicka [Wed, 9 Jul 2025 09:51:03 +0000 (11:51 +0200)]
Improve afdo_adjust_guessed_profile
This patch makes afdo_adjust_guessed_profile more robust. Instead of using
median of scales we compute robust average wehre weights is taken from execution
count of edge it originates from and also I added a cap since in some cases
scaling factor may end up being very large introducing artificial hotest regions
of the program confusing ipa-profile's histogram based cutoff.
This was the problem of roms.
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
* auto-profile.cc (struct scale): New structure.
(add_scale): Also record weights.
(afdo_adjust_guessed_profile): Compute robust average
of scales and cap by max count in function.
Jan Hubicka [Mon, 7 Jul 2025 17:20:25 +0000 (19:20 +0200)]
Fix profile scaling in tree-inline.cc:initialize_cfun
initialize_cfun calls
profile_count::adjust_for_ipa_scaling (&num, &den);
but then the result is never used. This patch fixes it. Overall scalling
of entry/exit block is bit sloppy in tree-inline. I see if I can clean it up.
* tree-inline.cc (initialize_cfun): Use num and den for scaling.
Jan Hubicka [Mon, 7 Jul 2025 15:18:23 +0000 (17:18 +0200)]
Fix auto-profile.cc:get_original_name
There are two bugs in get_original_name. FIrst the for loop walking list of known
suffixes uses sizeos (suffixes). It evnetually walks to an empty suffix.
Second problem is that strcmp may accept suffixes that are longer. I.e.
mix up .isra with .israabc. This is probably not a big deal but the first
bug makes get_original_name to effectively strip all suffixes, even important
one on my setup.
gcc/ChangeLog:
* auto-profile.cc (get_original_name): Fix loop walking the
suffixes.
Jonathan Wakely [Wed, 9 Jul 2025 09:14:23 +0000 (10:14 +0100)]
libstdc++: Fix memory_resource.cc bootstrap failure for non-gthreads targets
The new choose_block_size function added in r16-2112-gac2fb60a67d6d1 was
defined inside an #ifdef _GLIBCXX_HAS_GTHREADS group, which means that
it's not available for single-threaded targets, and so can't be used by
unsynchronized_pool_resource. Move it before that preprocessor group so
it's always defined.
libstdc++-v3/ChangeLog:
* src/c++17/memory_resource.cc: Adjust indentation of unnamed
namespaces.
(pool_sizes): Add comment.
(choose_block_size): Move outside preprocessor group for
gthreads targets.
* testsuite/20_util/synchronized_pool_resource/118681.cc:
Require gthreads.
Thomas Schwinge [Wed, 9 Jul 2025 08:06:39 +0000 (10:06 +0200)]
Fix 'main' function in 'gcc.dg/builtin-dynamic-object-size-pr120780.c'
Fix-up for commit 72e85d46472716e670cbe6e967109473b8d12d38
"tree-optimization/120780: Support object size for containing objects".
'size_t sz' is unused here, and GCC/nvptx doesn't accept this:
spawn -ignore SIGHUP [...]/nvptx-none-run ./builtin-dynamic-object-size-pr120780.exe
error : Prototype doesn't match for 'main' in 'input file 1 at offset 1924', first defined in 'input file 1 at offset 1924'
nvptx-run: cuLinkAddData failed: unknown error (CUDA_ERROR_UNKNOWN, 999)
FAIL: gcc.dg/builtin-dynamic-object-size-pr120780.c execution test
I think a range could be set, since I think the number of latch executions is a
ceiling division of TYPE_MAX_VALUE / vf. To account for the partial iteration.
This would also then deal with the ICE cause in the PR where the chosen VF was
much higher than TYPE_MAX_VALUE and that a mask is relied upon to make it safe.
Since the patch was supposed to not change behavior I've added an additional
partial vector check on the const_vf > 0 check to make it explicit that we only
set it on non-partial vectors (alternative would have been to swap the order of
the vf.constant(&const_vf)) check, but that would have hidden the requirement
sneakily.
The second patch adds support for ranges for partial masks.
gcc/ChangeLog:
PR tree-optimization/120922
* tree-vect-loop-manip.cc (vect_gen_vector_loop_niters): Don't set range
for partial vectors.
gcc/testsuite/ChangeLog:
PR tree-optimization/120922
* gcc.dg/vect/pr120922.c: New test.
The libc memmove and memclr don't reliably operate on full memory words.
We already avoided them on PPC64, but the same problem can occur even
on x86, where some processors use "rep movsb" and "rep stosb".
Always use C code that stores full memory words.
While we're here, clean up the C code. We don't need special handling
if the memmove/memclr pointers are not pointer-aligned.
Unfortunately, this will likely be slower. Perhaps some day we can
have our own assembly code that operates a word at a time,
or we can use different operations when we know there are no pointers.
syscall: pass correct pointer to system call in recvmsgRaw
The code in recvmsgRaw, introduced in https://go.dev/cl/384695,
incorrectly passed &rsa to the recvmsg system call.
But in recvmsgRaw rsa is already a pointer passed by the caller.
This change passes the correct pointer.
I'm guessing that this didn't show up in the testsuite because
we run the tests in short mode.
Jonathan Wakely [Fri, 4 Jul 2025 15:44:13 +0000 (16:44 +0100)]
libstdc++: Ensure pool resources meet alignment requirements [PR118681]
For allocations with size > alignment and size % alignment != 0 we were
sometimes returning pointers that did not meet the requested aligment.
For example, allocate(24, 16) would select the pool for 24-byte objects
and the second allocation from that pool (at offset 24 bytes into the
pool) is only 8-byte aligned not 16-byte aligned.
The pool resources need to round up the requested allocation size to a
multiple of the alignment, so that the selected pool will always return
allocations that meet the alignment requirement.
libstdc++-v3/ChangeLog:
PR libstdc++/118681
* src/c++17/memory_resource.cc (choose_block_size): New
function.
(synchronized_pool_resource::do_allocate): Use choose_block_size
to determine appropriate block size.
(synchronized_pool_resource::do_deallocate): Likewise
(unsynchronized_pool_resource::do_allocate): Likewise.
(unsynchronized_pool_resource::do_deallocate): Likewise
* testsuite/20_util/synchronized_pool_resource/118681.cc: New
test.
* testsuite/20_util/unsynchronized_pool_resource/118681.cc: New
test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Marek Polacek [Tue, 8 Jul 2025 14:09:36 +0000 (10:09 -0400)]
c++: bogus error with union in qualified name [PR83469]
While working on Reflection I noticed that we reject:
union U { int i; };
constexpr auto r = ^^typename ::U;
which is due to PR83469. Andrew P. posted a patch in 2021:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586344.html
for which I had some comments but an updated patch never came.
~~
There are a few issues here with typenames and unions (and even struct
keywords with unions). First in cp_parser_check_class_key,
we need to allow typenames to name union types and union key
to be able to use with typenames.
The next issue is we need to record if we had a union key,
right now we just record it was a struct/class/typename one
which is wrong.
~~
This patch is an updated and cleaned up version; I've also addressed
a missing bit in pt.cc.
PR c++/83469
PR c++/93809
gcc/cp/ChangeLog:
* cp-tree.h (UNION_TYPE_P): Define.
(TYPENAME_IS_UNION_P): Define.
* decl.cc (struct typename_info): Add union_p field.
(struct typename_hasher::equal): Compare union_p field.
(build_typename_type): Use ti.union_p for union_type. Set
TYPENAME_IS_UNION_P.
* error.cc (dump_type) <case TYPENAME_TYPE>: Handle
TYPENAME_IS_UNION_P.
* module.cc (trees_out::type_node): Likewise.
* parser.cc (cp_parser_check_class_key): Allow typename key for union
types and allow union keyword for typename types.
* pt.cc (tsubst) <case TYPENAME_TYPE>: Don't conflate unions with
class_type. For TYPENAME_IS_CLASS_P, check NON_UNION_CLASS_TYPE_P
rather than CLASS_TYPE_P. Add TYPENAME_IS_UNION_P handling.
gcc/testsuite/ChangeLog:
* g++.dg/template/error45.C: Adjust dg-error.
* g++.dg/warn/Wredundant-tags-3.C: Remove xfail.
* g++.dg/parse/union1.C: New test.
* g++.dg/parse/union2.C: New test.
* g++.dg/parse/union3.C: New test.
* g++.dg/parse/union4.C: New test.
* g++.dg/parse/union5.C: New test.
* g++.dg/parse/union6.C: New test.
Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Jason Merrill <jason@redhat.com>
xtensa: Fix B[GE/LT]UI instructions with immediate values of 32768 or 65536 not being emitted
This is because in canonicalize_comparison() in gcc/expmed.cc, the COMPARE
rtx_cost() for the immediate values in the title does not change between
the old and new versions. This patch fixes that.
(note: Currently, this patch only works if some constant propagation
optimizations are enabled (-O2 or higher) or if bare large constant
assignments are possible (-mconst16 or -mauto-litpools). In the future
I hope to make it work at -O1...)
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_b4const_or_zero):
Remove.
(xtensa_b4const): Add a case where the value is 0, and rename
to xtensa_b4const_or_zero.
(xtensa_rtx_costs): Fix to also consider the result of
xtensa_b4constu().
Jakub Jelinek [Tue, 8 Jul 2025 17:21:55 +0000 (19:21 +0200)]
c++: Implement part of C++26 P2686R4 - constexpr structured bindings [PR117784]
The following patch implements the constexpr structured bindings part of
the P2686R4 paper, so the [dcl.pre], [dcl.struct.bind], [dcl.constinit]
and first hunk in [dcl.constexpr] changes.
The paper doesn't have a feature test macro and the constexpr structured
binding part of it seems more-less self-contained, so I think it is useful
to get this in independently from the rest.
Of course, automatic constexpr/constinit structured bindings in the
tuple cases or automatic constexpr/constinit structured bindings with auto &
will not really work for now.
Another reason for the split is that for C++ < 26, I think what the patch
implements is basically what the users will see, i.e. we can accept
constexpr or constinit structured binding with pedwarn, but I think we can't
change the constant expression rules in C++ < 26.
I plan to look at the rest of the paper.
2025-07-08 Jakub Jelinek <jakub@redhat.com>
PR c++/117784
* decl.cc: Implement part of C++26 P2686R4 - constexpr structured
bindings.
(cp_finish_decl): Pedwarn for C++23 and older on constinit on
structured bindings except for static/thread_local where it uses
earlier error.
(grokdeclarator): Pedwarn on constexpr structured bindings for
C++23 and older instead of emitting error always, don't clear
constexpr_p in that case.
* parser.cc (cp_parser_decomposition_declaration): Copy over
DECL_DECLARED_CONSTEXPR_P and DECL_DECLARED_CONSTINIT_P flags.
* g++.dg/cpp1z/decomp3.C (test): For constexpr structured binding
initialize from constexpr var instead of non-constexpr and expect
just a pedwarn for C++23 and older instead of error always.
* g++.dg/cpp26/decomp9.C (foo): Likewise.
* g++.dg/cpp26/decomp22.C: New test.
* g++.dg/cpp26/decomp23.C: New test.
* g++.dg/cpp26/decomp24.C: New test.
* g++.dg/cpp26/decomp25.C: New test.
Tomasz Kamiński [Tue, 8 Jul 2025 08:04:41 +0000 (10:04 +0200)]
libstdc++: Do not expose set_brackets/set_separator for formatter with format_kind other than sequence [PR119861]
The standard defines separate specializations of range-default-formatter, out
of which only one for range_format::sequence provide the set_brackets and
set_separator methods. We implemented it as one specialization and exposed
this method for range_format other than string or debug_string, i.e. when
range_formatter was used as underlying formatter.
PR libstdc++/119861
libstdc++-v3/ChangeLog:
* include/std/format (formatter<_Rg, _CharT>::set_separator)
(formatter<_Rg, _CharT>::set_brackets): Constrain with
(format_kind<_Rg> == range_format::sequence).
* testsuite/std/format/ranges/pr119861_neg.cc: New test.
s390: Always compute address of stack protector guard
Computing the address of the thread pointer on s390 involves multiple
instructions and therefore bears the risk that the address of the canary
or intermediate values of it are spilled after prologue in order to be
reloaded for the epilogue. Since there exists no mechanism to ensure
that a value is not coming from stack, as a precaution compute the
address always twice, i.e., one time for the prologue and one time for
the epilogue. Note, even if there were such a mechanism, emitting
optimal code is non-trivial since there exist cases with opposing
requirements as e.g. if the thread pointer is not only computed for the
TLS guard but also for other TLS objects. For the latter accesses it is
desired to spill and reload the thread pointer instead of recomputing it
whereas for the former it is not.
gcc/ChangeLog:
* config/s390/s390.md (stack_protect_get_tpsi): New insn.
(stack_protect_get_tpdi): New insn.
(stack_protect_set): Use new insn.
(stack_protect_test): Use new insn.
gcc/testsuite/ChangeLog:
* gcc.target/s390/stack-protector-guard-tls-1.c: New test.
Richard Biener [Tue, 8 Jul 2025 11:46:01 +0000 (13:46 +0200)]
Avoid IPA opts around guality plumbing
The following avoids inlining the actual main() (renamed to
guality_main) into the guality plumbing. This can cause
jump threading opportunities to appear and generally increase
the chance what we actually test isn't what we think. Likewise
make guality_check noipa instead of just noinline.
Robin Dapp [Tue, 8 Jul 2025 09:17:41 +0000 (11:17 +0200)]
RISC-V: Ignore non-types in builtin function hash.
If a user passes a string that doesn't represent a variable we still try
to compute a hash for its type. Its tree does not represent a type but
just an exceptional, though. This patch just ignores it, leaving the
error to the checking code later.
libstdc++: Set feature test macro for complete C++23 mdspan [PR107761].
PR libstdc++/107761
libstdc++-v3/ChangeLog:
* include/bits/version.def (mdspan): Set to 202207 and remove
no_stdname.
* include/bits/version.h: Regenerate.
* testsuite/23_containers/mdspan/version.cc: Test presence
of feature test macro.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Implements the class mdspan as described in N4950, i.e. without P3029.
It also adds tests for mdspan. This commit completes the implementation
of P0009, i.e. the C++23 part <mdspan>.
PR libstdc++/107761
libstdc++-v3/ChangeLog:
* include/std/mdspan (mdspan): New class.
* src/c++23/std.cc.in (mdspan): Add.
* testsuite/23_containers/mdspan/class_mandate_neg.cc: New test.
* testsuite/23_containers/mdspan/mdspan.cc: New test.
* testsuite/23_containers/mdspan/layout_like.h: Add class
LayoutLike which models a user-defined layout.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
The current code uses __mdspan::__fwd_prod(__exts, __rank) to express
computing the size of an extent. This commit adds an function __mdspan::
__size(__exts) to express the idea more directly.
libstdc++-v3/ChangeLog:
* include/std/mdspan (__mdspan::__size): New function.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
libstdc++: Restructure mdspan tests to reuse IntLike.
The class IntLike is used for testing extents with user-defined classes
that convert to int. This commit places the class into a separate header
file. This allows it to be reused across different parts of the mdspan
related testsuite.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/mdspan/extents/custom_integer.cc:
Delete IntLike and include "int_like.h".
* testsuite/23_containers/mdspan/extents/int_like.h: Add
IntLike.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
was not checked. This commit adds the __glibcxx_assert and test them.
libstdc++-v3/ChangeLog:
* include/std/mdspan (extents): Check prerequisite of the ctor that
static_extent(i) == dynamic_extent || extent(i) == other.extent(i).
* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc:
Test the implemented prerequisite.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
libstdc++: Check prerequisites of layout_*::operator().
Previously, the prerequisite that the arguments passed to operator() are
a multi-dimensional index (of extents()) was not checked.
Both mapping::operator() and mdspan::operator[] have the same
prerequisite. Since, mdspan must check the prerequisite for user-defined
layout mappings, the preference is to check in mdspan.
Because out-of-bounds accesses are very common it's nevertheless useful
to check the prerequisite in mapping::operator(). This is relevant for
cases where the layout mappings are used without mdspan. This commit
checks the prerequisites via _GLIBCXX_DEBUG_ASSERTs and adds the required
tests.
* include/std/mdspan: Check prerequisites of
layout_*::operator() with _GLIBCXX_DEBUG_ASSERTs.
* testsuite/23_containers/mdspan/layouts/debug/out_of_bounds_neg.cc:
Add tests for prerequisites.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>
Mklog parses the diff content from prepare-commit-msg hook but fails
when git has been configured with mnemonicPrefix. Forcing the default
values for the prefixes would set a distinct diff configuration supported
by mklog and prevent most failures.
The new vector pattern tests used int128 without guard. This causes
failure on 31bit targets. Split the tests such that the tests
requiring 128 bit support are only executed on targets supporting
them.
Richard Biener [Sun, 25 May 2025 17:29:04 +0000 (19:29 +0200)]
add masked-epilogue tuning
The following adds a x86 tuning to enable the use of AVX512 masked
epilogues in cases we heuristically determine it to be not detrimental
by high chance. Basically problematic cases are when there are
data streams that are both stored and loaded from and an outer loop
could end up executing only the inner loop masked epilogue and with
unlucky data stream advacement from the outer loop end up needing
to forward from masked stores to masked loads. This isn't very
well handled, esp. for the case where unmasked operations would
not need to forward at all - that is, when forwarding completely
from the masked out portion of the store (like the AVX upper half
to the AVX lower half of a load). There's also the case where
the number of iterations is known at compile time, only with
cost comparing we'd consider a non-masked epilog - as we are not
doing that we have to add heuristics to avoid masking when a
single vector epilog iteration would cover all scalar iterations
left (this is exercised by gcc.target/i386/pr110310.c).
SPEC CPU 2017 shows 3% text size savings over not using masked
epilogues with performance impact in the noise. Masking all vector
epilogues gets that to 4% text size savings with some major
runtime regressions in 503.bwaves_r and 527.cam4_r
(measured on a Zen4 system), we're leaving a 5% improvement
for 549.fotonik3d_r unrealized with the implemented heuristic.
With the heuristics we turn 22513 vector epilogues + up to 12305 scalar
epilogues into 12305 masked vector epilogues of which 574 are for
AVX vector sizes, 79 for SSE vector sizes and the rest for AVX512.
When masking all epilogues we get 14567 of them from
29467 vector + up to 14567 scalar epilogues, so the heuristics disable
an additional 20% of masked epilogues.
* config/i386/x86-tune.def (X86_TUNE_AVX512_MASKED_EPILOGUES):
New tunable, default on for m_ZNVER4 and m_ZNVER5.
* config/i386/i386.cc (ix86_vector_costs::finish_cost): With
X86_TUNE_AVX512_MASKED_EPILOGUES and when the main loop
had a vectorization factor > 2 use a masked epilogue when
possible and when not obviously problematic.
* gcc.target/i386/vect-mask-epilogue-1.c: New testcase.
* gcc.target/i386/vect-mask-epilogue-2.c: Likewise.
* gcc.target/i386/vect-epilogues-3.c: Adjust.
Richard Biener [Sun, 25 May 2025 17:28:54 +0000 (19:28 +0200)]
Allow the target to request a masked vector epilogue
Targets recently got the ability to request the vector mode to be
used for a vector epilogue (or the epilogue of a vector epilogue). The
following adds the ability for it to indicate the epilogue should use
loop masking, irrespective of the --param vect-partial-vector-usage
default setting.
The patch below uses a separate flag from the epilogue mode, not
addressing the issue that on x86 the vector_modes mode iteration
hook would not allow for both masked and unmasked variants to be
tried and costed given this doesn't naturally map to modes on
that target. That's left for a future exercise - turning on
cost comparison for the x86 backend would be a prerequesite there.
* tree-vectorizer.h (vector_costs::suggested_epilogue_mode):
Add masked output parameter and return m_masked_epilogue.
(vector_costs::m_masked_epilogue): New tristate flag.
(vector_costs::vector_costs): Initialize m_masked_epilogue.
* tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked
flag to optionally initialize can_use_partial_vectors_p.
(vect_analyze_loop): For epilogues also get whether to use
a masked epilogue for this loop from the target and use
that for the first epilogue mode we try.
Fortran: Ensure finalizers are created correctly [PR120637]
Finalize_component freeed an expression that it used to remember which
components in which context it had finalized already. While it makes
sense to free the copy of the expression, if it is unused, it causes
issues, when comparing to a non existent expression. This is now
detected by returning true, when the expression has been used.
PR fortran/120637
gcc/fortran/ChangeLog:
* class.cc (finalize_component): Return true, when a finalizable
component was detect and do not free it.
Richard Biener [Mon, 7 Jul 2025 13:13:38 +0000 (15:13 +0200)]
tree-optimization/120358 - bogus PTA with structure access
When we compute the constraint for something like
MEM[(const struct QStringView &)&tok2 + 32] we go and compute
what (const struct QStringView &)&tok2 + 32 points to and then
add subvariables to its dereference that possibly fall in the
range of the access according to the original refs size. In
doing that we disregarded that the subvariable the starting
address points to might not be aligned to it and thus the
access might start at any point within that variable. The following
conservatively adjusts the pruning of adjacent sub-variables to
honor this.
PR tree-optimization/120358
* tree-ssa-structalias.cc (get_constraint_for_1): Adjust
pruning of sub-variables according to the imprecise
known start offset.
François Dumont [Thu, 27 Mar 2025 18:02:59 +0000 (19:02 +0100)]
libstdc++: Make debug iterator pointer sequence const [PR116369]
In revision a35dd276cbf6236e08bcf6e56e62c2be41cf6e3c the debug sequence
have been made mutable to allow attach iterators to const containers.
This change completes this fix by also declaring debug unordered container
members mutable.
Additionally the debug iterator sequence is now a pointer-to-const and so
_Safe_sequence_base _M_attach and all other methods are const qualified.
Not-const methods exported are preserved for abi backward compatibility.
libstdc++-v3/ChangeLog:
PR c++/116369
* config/abi/pre/gnu-versioned-namespace.ver: Use new const qualified symbols.
* config/abi/pre/gnu.ver: Add new const qualified symbols.
* include/debug/safe_base.h
(_Safe_iterator_base::_M_sequence): Declare as pointer-to-const.
(_Safe_iterator_base::_M_attach, _M_attach_single): New, take pointer-to-const
_Safe_sequence_base.
(_Safe_sequence_base::_M_detach_all, _M_detach_singular, _M_revalidate_singular)
(_M_swap, _M_get_mutex): New, const qualified.
(_Safe_sequence_base::_M_attach, _M_attach_single, _M_detach, _M_detach_single):
const qualify.
* include/debug/safe_container.h (_Safe_container<>::_M_cont): Add const qualifier.
(_Safe_container<>::_M_swap_base): New.
(_Safe_container(_Safe_container&&, const _Alloc&, std::false_type)):
Adapt to use latter.
(_Safe_container<>::operator=(_Safe_container&&)): Likewise.
(_Safe_container<>::_M_swap): Likewise and take parameter as const reference.
* include/debug/safe_unordered_base.h
(_Safe_local_iterator_base::_M_safe_container): New.
(_Safe_local_iterator_base::_Safe_local_iterator_base): Take
_Safe_unordered_container_base as pointer-to-const.
(_Safe_unordered_container_base::_M_attach, _M_attach_single): New, take
container as _Safe_unordered_container_base pointer-to-const.
(_Safe_unordered_container_base::_M_local_iterators, _M_const_local_iterators):
Add mutable.
(_Safe_unordered_container_base::_M_detach_all, _M_swap): New, const qualify.
(_Safe_unordered_container_base::_M_attach_local, _M_attach_local_single)
(_M_detach_local, _M_detach_local_single): Add const qualifier.
* include/debug/safe_unordered_container.h (_Safe_unordered_container::_M_self()): New.
* include/debug/safe_unordered_container.tcc
(_Safe_unordered_container::_M_invalidate_if, _M_invalidated_local_if): Use latter.
* include/debug/safe_iterator.h (_Safe_iterator<>::_M_attach, _M_attach_single):
Take _Safe_sequence_base as pointer-to-const.
(_Safe_iterator<>::_M_get_sequence): Add const_cast and comment about it.
* include/debug/safe_local_iterator.h (_Safe_local_iterator<>): Replace usages
of _M_sequence member by _M_safe_container().
(_Safe_local_iterator<>::_M_attach, _M_attach_single): Take
_Safe_unordered_container_base as pointer-to-const.
(_Safe_local_iterator<>::_M_get_sequence): Rename into...
(_Safe_local_iterator<>::_M_get_ucontainer): ...this. Add necessary const_cast and
comment to explain it.
(_Safe_local_iterator<>::_M_is_begin, _M_is_end): Adapt.
* include/debug/safe_local_iterator.tcc: Adapt.
* include/debug/safe_sequence.h
(_Safe_sequence<>::_M_invalidate_if, _M_transfer_from_if): Add const qualifier.
* include/debug/safe_sequence.tcc: Adapt.
* include/debug/deque (std::__debug::deque::erase): Adapt to use new const
qualified methods.
* include/debug/formatter.h: Adapt.
* include/debug/forward_list (_Safe_forward_list::_M_this): Add const
qualification and return pointer for consistency with 'this' keyword.
(_Safe_forward_list::_M_swap_aux): Rename into...
(_Safe_forward_list::_S_swap_aux): ...this and take sequence as const reference.
(forward_list<>::resize): Adapt to use const methods.
* include/debug/list (list<>::resize): Likewise.
* src/c++11/debug.cc: Adapt to const qualification.
* testsuite/util/testsuite_containers.h
(forward_members_unordered::forward_members_unordered): Add check on local_iterator
conversion to const_local_iterator.
(forward_members::forward_members): Add check on iterator conversion to
const_iterator.
* testsuite/23_containers/unordered_map/const_container.cc: New test case.
* testsuite/23_containers/unordered_multimap/const_container.cc: New test case.
* testsuite/23_containers/unordered_multiset/const_container.cc: New test case.
* testsuite/23_containers/unordered_set/const_container.cc: New test case.
* testsuite/23_containers/vector/debug/mutex_association.cc: Adapt.
VxWorks6 used symbols __GOTT_BASE__ and __GOTT_INDEX__ to obtain the
address of the global offset table. Starting with VxWorks7, that is
no longer the case, but we've still issued these symbols in
output_set_got. Do that only with VxWorks<7.
Switching to the call-based PIC register sequence, we have to set the
flag that prevents the use of the red zone, and AFAICT the reasons
that ruled out GOTOFF and other relative addressing no longer apply to
VxWorks7+.
for gcc/ChangeLog
* config/vxworks-dummy.h (TARGET_VXWORKS_VAROFF): New.
(TARGET_VXWORKS_GOTTPIC): New.
* config/vxworks.h (TARGET_VXWORKS_VAROFF): Override.
(TARGET_VXWORKS_GOTTPIC): Likewise.
* config/i386/i386.cc (output_set_got): Disable VxWorks6 GOT
sequence on VxWorks7.
(legitimize_pic_address): Accept relative addressing of
labels on VxWorks7.
(ix86_delegitimize_address_1): Likewise.
(ix86_output_addr_diff_elt): Likewise.
* config/i386/i386.md (tablejump): Likewise.
(set_got, set_got_labelled): Set no-red-zone flag on VxWorks7.
* config/i386/predicates.md (gotoff_operand): Test
TARGET_VXWORKS_VAROFF.
Jeff Law [Tue, 8 Jul 2025 02:48:17 +0000 (20:48 -0600)]
[committed] Minor fix to gcc.dg/torture/pr120654.c
I don't recall which port complained, but pr120654.c was failing on one or more
of the embedded targets due to the use of malloc/free. This change just turns
them into the __builtin variants which makes everyone happy again.
gcc/testsuite
* gcc.dg/torture/pr120654.c: Use __builtin variants of malloc and free.
Jeff Law [Tue, 8 Jul 2025 02:42:04 +0000 (20:42 -0600)]
[committed][RISC-V] Fix testsuite fallout from check-function-bodies change
Minor fallout from HJ's recent change to the check-function-bodies code in the testsuite.
The label isn't at all important here, so forcing it match is just a waste of time. So this patch just skips over the label. It fixes a handful of failures in testsuite:
It's not strictly necessary, because nothing defined therein is
referenced by anything in gcc/config/aarch64, but it was an oversight
to not have it there.
for gcc/ChangeLog
* config.gcc (vxworks-dummy.h): Add to aarch64-*-* as well.
Jonathan Wakely [Fri, 4 Jul 2025 20:19:52 +0000 (21:19 +0100)]
libstdc++: Fix attribute order on __normal_iterator friends [PR120949]
In r16-1911-g6596f5ab746533 I claimed to have reordered some attributes
for compatibility with Clang, but it looks like I got the Clang
restriction backwards and put them all in the wrong order. Clang trunk
accepts either order (probably since the llvm/llvm-project#133107 fix)
but released versions still require a particular order.
There were also some cases where the attributes were after the friend
keyword, which Clang trunk still rejects.
libstdc++-v3/ChangeLog:
PR libstdc++/120949
* include/bits/stl_iterator.h (__normal_iterator): Fix order of
always_inline and nodiscard attributes for Clang compatibility.
Jonathan Wakely [Tue, 1 Jul 2025 11:44:04 +0000 (12:44 +0100)]
libstdc++: Make VERIFY a variadic macro
This defines the testsuite assertion macro VERIFY so that it allows
un-parenthesized expressions containing commas. This matches how assert
is defined in C++26, following the approval of P2264R7.
The primary motivation is to allow expressions that the preprocessor
splits into multiple arguments, e.g.
VERIFY( vec == std::vector<int>{1,2,3,4} );
To achieve this, VERIFY is redefined as a variadic macro and then the
arguments are grouped together again through the use of __VA_ARGS__.
The implementation is complex due to the following points:
- The arguments __VA_ARGS__ are contextually-converted to bool, so that
scoped enums and types that are not contextually convertible to bool
cannot be used with VERIFY.
- bool(__VA_ARGS__) is used so that multiple arguments (i.e. those which
are separated by top-level commas) are ill-formed. Nested commas are
allowed, but likely mistakes such as VERIFY( cond, "some string" ) are
ill-formed.
- The bool(__VA_ARGS__) expression needs to be unevaluated, so that we
don't evaluate __VA_ARGS__ more than once. The simplest way to do that
would be just sizeof bool(__VA_ARGS__), without parentheses to avoid a
vexing parse for VERIFY(bool(i)). However that wouldn't work for e.g.
VERIFY( []{ return true; }() ), because lambda expressions are not
allowed in unevaluated contexts until C++20. So we use another
conditional expression with bool(__VA_ARGS__) as the unevaluated
operand.
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_hooks.h (VERIFY): Define as variadic
macro.
* testsuite/ext/verify_neg.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Mon, 7 Jul 2025 10:32:48 +0000 (11:32 +0100)]
libstdc++: Use template keyword in __mapping_of alias template
This is needed to fix an error with Clang 19:
include/c++/16.0.0/mdspan:512:30: error: use 'template' keyword to treat 'mapping' as a dependent template name
512 | is_same_v<typename _Layout::mapping<typename _Mapping::extents_type>,
| ^
Revert "Extend "counted_by" attribute to pointer fields of structures. Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE." due to PR120929.
Martin Jambor [Mon, 7 Jul 2025 19:50:19 +0000 (21:50 +0200)]
Ignore more clang warnings in contrib/filter-clang-warnings.py
in contrib we have a script filter-clang-warnings.py which supposedly
filters out uninteresting warnings emitted by clang when it compiles
GCC. I'm not sure if anyone else uses it but our internal SUSE
testing infrastructure does.
Since Martin Liška left, I have mostly ignored the warnings and so
they have multiplied. In an effort to improve the situation, I have
tried to fix those warnings which I think are worth it and would like
to adjust the filtering script so that we get to zero "interesting"
warnings again.
The changes are the following:
1. Ignore -Woverloaded-shift-op-parentheses warnings. IIUC, those
make some sense when << and >> are used for I/O but since that is
not the case in GCC they are not really interesting.
2. Ignore -Wunused-function and -Wunneeded-internal-declaration. I
think it is OK to occasionally prepare APIs before they are used
(and with our LTO we should be able to get rid of them).
3. Ignore -Wvla-cxx-extension and -Wunused-command-line-argument which
just don't seem to be useful.
4. Ignore -Wunused-private-field warning in diagnostic-path-output.cc
which can only be correct if quite a few functions are removed and
looks like it is just not an oversight:
gcc/diagnostic-path-output.cc:271:35: warning: private field 'm_logical_loc_mgr' is not used [-Wunused-private-field]
5. Ignore a case in -Wunused-but-set-variable about named_args which
is used in a piece of code behind an ifdef in ipa-strub.cc.
6. Adjust the gimple-match and generic-match filters to the fact that
we now have multiple such files.
7. Ignore warnings about using memcpy to copy around wide_ints, like
the one below. I seem to remember wide-int has undergone fairly
rigorous review and TBH I just hope I know what we are doing.
gcc/wide-int.h:1198:11: warning: first argument in call to 'memcpy' is a pointer to non-trivially copyable type 'wide_int_storage' [-Wnontrivial-memcall]
8. Ignore -Wc++11-narrowing warning reported in omp-builtins.def when
it is included from JIT. The code probably has a bigger issue
described in PR 120960.
9. Since the patch number 14 in the original series did not get
approved, I assume that private member field m_wanted_type of class
element_expected_type_with_indirection in c-family/c-format.cc will
get a use sooner or later, so I ignore a warning about it being
unused.
10. I have decided to ignore warnings in m2/gm2-compiler-boot about
unused stuff (all reported unused stuff are variables). These
sources are in the build directory so I assume they are somehow
generated and so warnings about unused things are a bit expected
and probably not too bad.
11. On the Zulip chat, I have informed Rust folks they have a bunch of
-Wunused-private-field cases in the FE. Until they sort it out
I'm ignoring these. I might add the missing explicit type-cast
case here too if it takes time for the patch I'm posting in this
series to reach master.
12. I ignore warning about use of offsetof in libiberty/sha1.c which is
apparently only a "C23 extension:"
libiberty/sha1.c:239:11: warning: defining a type within 'offsetof' is a C23 extension [-Wc23-extensions]
libiberty/sha1.c:460:11: warning: defining a type within 'offsetof' is a C23 extension [-Wc23-extensions]
13. I have enlarged the list of .texi files where warnings somehow got
reported. Not sure why that happens.
14. In analyzer/sm.cc there are several "no-op" methods which have
named but unused parameters. It seems this is deliberate and so I
have filtered the -Wunused-parameter warning for this file.
I have also re-arranged the entries in a way which hopefully makes
somewhat more sense.
Thanks,
Martin
contrib/ChangeLog:
2025-07-07 Martin Jambor <mjambor@suse.cz>
* filter-clang-warnings.py (skip_warning): Also ignore
-Woverloaded-shift-op-parentheses, -Wunused-function,
-Wunneeded-internal-declaration, -Wvla-cxx-extension', and
-Wunused-command-line-argument everywhere and a warning about
m_logical_loc_mgr in diagnostic-path-output.cc. Adjust gimple-match
and generic-match "filenames." Ignore -Wnontrivial-memcall warnings
in wide-int.h, all warnings about unused stuff in files under
m2/gm2-compiler-boot, all -Wunused-private-field in rust FE, in
analyzer/ana-state-to-diagnostic-state.h and c-family/c-format.cc, all
Warnings in avr-mmcu.texi, install.texi and libgccjit.texi and all
-Wc23-extensions warnings in libiberty/sha1.c. Ignore
-Wunused-parameter in analyzer/sm.cc. Reorder entries.
Martin Jambor [Mon, 7 Jul 2025 19:48:16 +0000 (21:48 +0200)]
ranger: Mark three occurrences of verify_range with overide
In line with my previous patches introducing override where clang
warnings indicate that they are missing, this patch adds it to three
new member functions overriding ancestor virtual functions that do not
have them.
Since Andrew has pre-approved such changes for ranger, I am going to
push it to master after bootstrapping it on x86_64-linux.
Thanks,
Martin
gcc/ChangeLog:
2025-07-07 Martin Jambor <mjambor@suse.cz>
* value-range.h (class irange): Mark member function verify_range
with override.
(class prange): Mark member function verify_range with final override.
(class frange): Mark member function verify_range with override.
H.J. Lu [Mon, 30 Jun 2025 20:46:31 +0000 (04:46 +0800)]
xtensa: Remove TARGET_PROMOTE_PROTOTYPES
xtensa ABI requires sign extension of signed 8/16-bit arguments to 32
bits and zero extension of unsigned 8/16-bit arguments to 32 bits.
TARGET_PROMOTE_PROTOTYPES is an optimization, not an ABI requirement.
Remove TARGET_PROMOTE_PROTOTYPES and define xtensa_promote_function_mode
to properly extend 8/16-bit arguments to 32 bits.
Juergen Christ [Fri, 20 Jun 2025 14:08:34 +0000 (16:08 +0200)]
s390: Optimize fmin/fmax.
On VXE targets, we can directly use the fp min/max instruction instead of
calling into libm for fmin/fmax etc.
Provide fmin/fmax versions also for vectors even though it cannot be
called directly. This will be exploited with a follow-up patch when
reductions are introduced.
gcc/ChangeLog:
* config/s390/s390.md: Update UNSPECs
* config/s390/vector.md (fmax<mode>3): New expander.
(fmin<mode>3): New expander.
* config/s390/vx-builtins.md (*fmin<mode>): New insn.
(vfmin<mode>): Redefined to use new insn.
(*fmax<mode>): New insn.
(vfmax<mode>): Redefined to use new insn.
gcc/testsuite/ChangeLog:
* gcc.target/s390/fminmax-1.c: New test.
* gcc.target/s390/fminmax-2.c: New test.
Alfie Richards [Thu, 13 Feb 2025 15:59:43 +0000 (15:59 +0000)]
c++: Fix FMV return type ambiguation
Add logic for the case of two FMV annotated functions with identical
signature other than the return type.
Previously this was ignored, this changes the behavior to emit a diagnostic.
gcc/cp/ChangeLog:
PR c++/119498
* decl.cc (duplicate_decls): Change logic to not always exclude FMV
annotated functions in cases of return type non-ambiguation.
gcc/testsuite/ChangeLog:
PR c++/119498
* g++.target/aarch64/pr119498.C: New test.
In r14-1659 I added a missing error for a Concepts TS feature that we were
failing to diagnose, but this PR requests a way to disable that error for
code written thinking it was valid. Which seems reasonable, since it
doesn't require any work beyond that and is a plausible extension by itself.
While looking at this, I also noticed we were still not giving the
diagnostic in a few cases, and fixing that affected a few of our old
concepts testcases.
* parser.cc (cp_parser_simple_type_specifier): Attach
auto in targ in parameter to -Wabbreviated-auto-in-template-arg.
(cp_parser_placeholder_type_specifier): Diagnose constrained auto in
template arg.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/auto7a.C: Add diagnostic.
* g++.dg/concepts/auto7b.C: New test.
* g++.dg/concepts/auto7c.C: New test.
* g++.dg/cpp1y/pr85076.C: Expect 'auto' error.
* g++.dg/concepts/pr67249.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic.C: Likewise.
* g++.dg/cpp2a/concepts-pr67210.C: Likewise.
* g++.dg/concepts/pr67249a.C: New test.
* g++.dg/cpp1y/lambda-generic-variadic-a.C: New test.
* g++.dg/cpp2a/concepts-pr67210a.C: New test.
The TImode popcount sequence can be slightly improved with SVE.
If we generate:
ldr q31, [x0]
ptrue p7.b, vl16
cnt z31.d, p7/m, z31.d
addp d31, v31.2d
fmov x0, d31
ret
we use the ADDP instruction for reduction, which is cheaper on all CPUs AFAIK,
as it is only a single 64-bit addition vs the tree of additions for ADDV.
For example, on a CPU like Grace we get a latency and throughput of 2,4 vs 4,1
for ADDV.
We do generate one more instruction due to the PTRUE being materialised, but that
is cheap itself and can be scheduled away from the critical path or even CSE'd
with other PTRUE constants.
As this sequence is larger code size-wise it is avoided for -Os.
Bootstrapped and tested on aarch64-none-linux-gnu.
libstdc++: Format chrono %a/%A/%b/%h/%B/%p using locale's time_put [PR117214]
C++ formatting locale could have a custom time_put that performs
differently from the C locale, so do not use __timepunct directly,
instead all of above specifiers use _M_locale_fmt.
For %a/%A/%b/%h/%B, the code handling the exception is now moved
to the _M_check_ok function, that is invoked before handling of the
conversion specifier. For time_points the values of months/weekday
are computed, and thus are always ok(), this information is indicated
by new _M_time_point member of the _ChronoSpec.
The different behavior of j specifier for durations and time_points/calendar
types, is now handled using only _ChronoParts, and _M_time_only in _ChronoSpec
is no longer needed, thus it was removed.
PR libstdc++/117214
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (_ChronoSpec::_M_time_only): Remove.
(_ChronoSpec::_M_time_point): Define.
(__formatter_chrono::_M_parse): Use __parts to determine
interpretation of j.
(__formatter_chrono::_M_check_ok): Define.
(__formatter_chrono::_M_format_to): Invoke _M_check_ok.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B): Move
exception throwing to _M_check_ok.
(__formatter_chrono::_M_j): Use _M_needs to define interpretation.
(__formatter_duration::_S_spec_for): Set _M_time_point.
* testsuite/std/time/format/format.cc: Test for exception for !ok()
months/weekday.
* testsuite/std/time/format/pr117214_custom_timeput.cc: New
test.
Co-authored-by: Tomasz Kaminski <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: XU Kailiang <xu2k3l4@outlook.com> Signed-off-by: Tomasz Kaminski <tkaminsk@redhat.com>
Richard Biener [Mon, 7 Jul 2025 07:56:50 +0000 (09:56 +0200)]
tree-optimization/120817 - bogus DSE of .MASK_STORE
DSE used ao_ref_init_from_ptr_and_size for .MASK_STORE but
alias-analysis will use the specified size to disambiguate
against smaller objects. For .MASK_STORE we instead have to
make the access size unspecified but we can still constrain
the access extent based on the maximum size possible.
PR tree-optimization/120817
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Use
ao_ref_init_from_ptr_and_range with unknown size for
.MASK_STORE and .MASK_LEN_STORE.
Pan Li [Wed, 2 Jul 2025 02:52:25 +0000 (10:52 +0800)]
RISC-V: Add test cases for unsigned scalar SAT_MUL from uint128_t
Add run and tree-optimized check for unsigned scalar SAT_MUL from
uint128_t.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_arith_data.h: Add test data for
run test.
* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u128.c: New test.
Pan Li [Wed, 2 Jul 2025 02:35:10 +0000 (10:35 +0800)]
RISC-V: Implement unsigned scalar SAT_MUL from uint128_t
This patch would like to implement the SAT_MUL scalar unsigned from
uint128_t, aka:
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint128_t x = (uint128_t)a * (uint128_t)b;
NT max = -1;
if (x > (uint128_t)(max))
return max;
else
return (NT)x;
}
Take uint64_t and uint8_t as example:
Before this patch for uint8_t:
10 │ sat_u_mul_uint8_t_from_uint128_t_fmt_1:
11 │ mulhu a5,a0,a1
12 │ mul a0,a0,a1
13 │ bne a5,zero,.L3
14 │ li a5,255
15 │ bleu a0,a5,.L4
16 │ .L3:
17 │ li a0,255
18 │ .L4:
19 │ andi a0,a0,0xff
20 │ ret
After this patch for uint8_t:
10 │ sat_u_mul_uint8_t_from_uint128_t_fmt_1:
11 │ mul a0,a0,a1
12 │ li a5,255
13 │ sltu a5,a5,a0
14 │ neg a5,a5
15 │ or a0,a0,a5
16 │ andi a0,a0,0xff
17 │ ret
Before this patch for uint64_t:
10 │ sat_u_mul_uint64_t_from_uint128_t_fmt_1:
11 │ mulhu a5,a0,a1
12 │ mul a0,a0,a1
13 │ beq a5,zero,.L4
14 │ li a0,-1
15 │ .L4:
16 │ ret
After this patch for uint64_t:
10 │ sat_u_mul_uint64_t_from_uint128_t_fmt_1:
11 │ mulhsu a5,a1,a0
12 │ mul a0,a0,a1
13 │ snez a5,a5
14 │ neg a5,a5
15 │ or a0,a0,a5
16 │ ret
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_expand_usmul): Add new func
decl.
* config/riscv/riscv.cc (riscv_expand_xmode_usmul): Add new func
to expand Xmode SAT_MUL.
(riscv_expand_non_xmode_usmul): Ditto but for non-Xmode.
(riscv_expand_usmul): Add new func to implment SAT_MUL.
* config/riscv/riscv.md (usmul<mode>3): Add new pattern to match
standard name usmul.
Pan Li [Wed, 2 Jul 2025 01:59:26 +0000 (09:59 +0800)]
Widening-Mul: Support unsigned scalar SAT_MUL form 1
This patch would like to try to match the SAT_MUL during
widening-mul pass, aka below pattern.
NT __attribute__((noinline))
sat_u_mul_##NT##_fmt_1 (NT a, NT b)
{
uint128_t x = (uint128_t)a * (uint128_t)b;
NT max = -1;
if (x > (uint128_t)(max))
return max;
else
return (NT)x;
}
while the NT can be uint8_t, uint16_t, uint32_t and uint64_t.
gcc/ChangeLog:
* match.pd: Add new match pattern for unsigned SAT_MUL.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_mul):
new decl for pattern match func.
(match_unsigned_saturation_mul): Add new func to match unsigned
SAT_MUL.
(math_opts_dom_walker::after_dom_children): Try to match
unsigned SAT_MUL on NOP.
#define DEF_SAT_U_MUL_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_mul_##NT##_from_##WT##_fmt_1 (NT a, NT b) \
{ \
WT x = (WT)a * (WT)b; \
NT max = -1; \
if (x > (WT)(max)) \
return max; \
else \
return (NT)x; \
}
* internal-fn.cc (commutative_binary_fn_p): Add new case
for SAT_MUL.
* internal-fn.def (SAT_MUL): Add new IFN_SAT_MUL.
* optabs.def (OPTAB_NL): Remove fixed point limitation.
Tomasz Kamiński [Fri, 16 May 2025 05:12:36 +0000 (07:12 +0200)]
libstdc++: Format __float128 as _Float128 only when long double is not 128 IEEE [PR120976]
For powerpc64 and sparc architectures that both have __float128 and 128bit long double,
the __float128 is same type as long double/__ieee128 and already formattable.
The remaining specialization makes __float128 formattable on x86_64 via _Float128,
however __float128 is now not formattable on x86_32 (-m32) with -mlong-double-128,
where __float128 is distinct type from long double that is 128bit IEEE.
PR libstdc++/120976
libstdc++-v3/ChangeLog:
* include/std/format (formatter<__float128, _Char_T): Define if
_GLIBCXX_FORMAT_F128 == 2.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Spencer Abson [Mon, 16 Jun 2025 16:54:04 +0000 (16:54 +0000)]
aarch64: Add support for unpacked SVE FP comparisons
This patch extends our vec_cmp expander to support partial FP modes.
We use a predicate mode that is narrower the operation's VPRED to govern
unpacked FP operations under flag_trapping_math, so the expansion must
handle cases where the comparison's target and governing predicates have
different modes.
While such predicates enable all of the defined part of the operation, they
are not all-true. Their false bits contribute to the (trapping) behavior of
the operation, so we cannot have SVE_KNOWN_PTRUE.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (vec_cmp<mode><vpred>): Extend
to handle partial FP modes.
(@aarch64_pred_fcm<cmp_op><mode>): Likewise.
(@aarch64_pred_fcmuo<mode>): Likewise.
(*one_cmpl<mode>3): Rename to...
(@aarch64_pred_one_cmpl<mode>_z): ... this.
* config/aarch64/aarch64.cc (aarch64_emit_sve_fp_cond): Allow the
target and governing predicates to have different modes.
(aarch64_emit_sve_or_fp_conds): Likewise.
(aarch64_emit_sve_invert_fp_cond): Likewise.
(aarch64_expand_sve_vec_cmp_float): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/unpacked_fcm_1.c: New test.
* gcc.target/aarch64/sve/unpacked_fcm_2.c: Likewise.
vect: Fix VEC_WIDEN_PLUS_HI/LO choice for big-endian [PR118891]
In the tree codes and optabs, the "hi" in a vector hi/lo pair means
"most significant" and the "lo" means "least significant", with
sigificance following GCC's normal endian expectations. Thus on
big-endian targets, the hi part handles the first half of the elements
in memory order and the lo part handles the second half.
For tree codes, supportable_widening_operation first chooses hi/lo
pairs based on little-endian order and then uses:
if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
std::swap (c1, c2);
to adjust. However, the handling for internal functions was missing
an equivalent fixup. This led to several execution failures in vect.exp
on aarch64_be-elf.
If the hi/lo code fails, the internal function handling goes on to try
even/odd. But I couldn't see anything obvious that would put the even/
odd results back into the right order later, so there might be a latent
bug there too.
gcc/
PR tree-optimization/118891
* tree-vect-stmts.cc (supportable_widening_operation): Swap the
hi and lo internal functions on big-endian targets.
if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
{
bit = subreg_lsb (dst).to_constant ();
if (bit >= HOST_BITS_PER_WIDE_INT)
bit = HOST_BITS_PER_WIDE_INT - 1;
dst = SUBREG_REG (dst);
But a constant SUBREG_BYTE doesn't guarantee a constant subreg_lsb.
If the SUBREG_REG is a pair of N-bit registers on a big-endian target,
the most significant end has a SUBREG_BYTE of 0 but a subreg_lsb of N.
This N would then be non-constant for variable-length registers.
The patch fixes gcc.dg/torture/pr120276.c and other failures on
aarch64_be-elf.
gcc/
* ext-dce.cc (ext_dce_process_uses): Apply is_constant directly
to the subreg_lsb.
aarch64: Fix neon-sve-bridge.c failures for big-endian
Lowpart subregs are generally disallowed on big-endian SVE vector
registers, since the first memory element is stored at the least
significant end of the register, rather than the most significant end.
(See the comment at the head of aarch64-sve.md for details,
and aarch64_modes_compatible_p for the implementation.)
This means that arm_sve_neon_bridge.h needs to use custom define_insns
for big-endian targets, in lieu of using lowpart subregs. However,
one of those define_insns relied on the prohibited lowparts internally,
to convert an Advanced SIMD register to an SVE register. Since the
lowpart is not allowed, the lowpart_subreg would return null, leading
to a later ICE.
The simplest fix seems to be to use %Z instead, to force the Advanced
SIMD register to be written as an SVE register.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_set_neonq_<mode>):
Use %Z instead of lowpart_subreg. Tweak formatting.
aarch64: Fix ZIP1 order in aarch64_expand_vector_init [PR118891]
aarch64_expand_vector_init contains some divide-and-conquer code
that tries to load the odd and even elements into 64-bit registers
and then ZIP them together. On big-endian targets, the even elements
are more significant than the odd elements and so should come second
in the ZIP.
This fixes many execution failures on aarch64_be-elf, including
gcc.c-torture/execute/pr28982a.c.
gcc/
PR target/118891
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Fix the
ZIP1 operand order for big-endian targets.
H.J. Lu [Tue, 17 Jun 2025 02:17:17 +0000 (10:17 +0800)]
x86: Improve vector_loop/unrolled_loop for memset/memcpy
1. Don't generate the loop if the loop count is 1.
2. For memset with vector on small size, use vector if small size supports
vector, otherwise use the scalar value.
3. Always expand vector-version of memset for vector_loop.
4. Always duplicate the promoted scalar value for vector_loop if not 0 nor
-1.
5. Use misaligned prologue if alignment isn't needed. When misaligned
prologue is used, check if destination is actually aligned and update
destination alignment if aligned.
6. Use move_by_pieces and store_by_pieces for memcpy and memset epilogues
with the fixed epilogue size to enable overlapping moves and stores.
The included tests show that codegen of vector_loop/unrolled_loop for
memset/memcpy are significantly improved. For
PR target/120670
PR target/120683
* config/i386/i386-expand.cc (expand_set_or_cpymem_via_loop):
Don't generate the loop if the loop count is 1.
(expand_cpymem_epilogue): Use move_by_pieces.
(setmem_epilogue_gen_val): New.
(expand_setmem_epilogue): Use store_by_pieces.
(expand_small_cpymem_or_setmem): Choose cpymem mode from MOVE_MAX.
For memset with vector and the size is smaller than the vector
size, first try the narrower vector, otherwise, use the scalar
value.
(promote_duplicated_reg): Duplicate the scalar value for vector.
(ix86_expand_set_or_cpymem): Always expand vector-version of
memset for vector_loop. Use misaligned prologue if alignment
isn't needed and destination isn't aligned. Always initialize
vec_promoted_val from the promoted scalar value for vector_loop.
Jakub Jelinek [Mon, 7 Jul 2025 07:17:34 +0000 (09:17 +0200)]
c++: Pedwarn on invalid decl specifiers for for-range-declaration [PR84009]
https://eel.is/c++draft/stmt.ranged#2
says that in for-range-declaration only type-specifier or constexpr
can appear. As the following testcases show, we've emitted some
diagnostics in most cases, but not for static/thread_local (the patch
handles __thread too) and register in the non-sb case.
For extern there was an error that it is both extern and has an
initializer (again, non-sb only, sb errors on extern).
The following patch diagnoses those cases with pedwarn.
I've used for-range-declaration in the diagnostics wording (there was
already a case of that for the typedef), so that in the future
we don't need to differentiate it between range for and expansion
statements.
2025-07-07 Jakub Jelinek <jakub@redhat.com>
PR c++/84009
* parser.cc (cp_parser_decomposition_declaration): Pedwarn
on thread_local, __thread or static in decl_specifiers for
for-range-declaration.
(cp_parser_init_declarator): Likewise, and also for extern
or register.
* g++.dg/cpp0x/range-for40.C: New test.
* g++.dg/cpp0x/range-for41.C: New test.
* g++.dg/cpp0x/range-for42.C: New test.
* g++.dg/cpp0x/range-for43.C: New test.
Mikael Morin [Mon, 7 Jul 2025 07:03:03 +0000 (09:03 +0200)]
fortran: Add the preliminary code of MOVE_ALLOC arguments
Add the preliminary code produced for the evaluation of the FROM and TO
arguments of the MOVE_ALLOC intrinsic before using their values.
Before this change, the preliminary code was ignored and dropped,
limiting the validity of the implementation of MOVE_ALLOC to simple
cases without preliminary code.
This change also adds the cleanup code of the same arguments. It
doesn't make any difference on the testcase though. Because of the
limited set of arguments that are allowed (variables or components
without subreference), it is possible that the cleanup code is actually
guaranteed to be empty. At least adding the cleanup code makes the
array case consistent with the scalar case.
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Add pre and
post code for the FROM and TO arguments.
Andrew Pinski [Sun, 6 Jul 2025 17:20:26 +0000 (10:20 -0700)]
crc: Error out on non-constant poly arguments for the crc builtins [PR120709]
These builtins requires a constant integer for the third argument but currently
there is assert rather than error. This fixes that and updates the documentation too.
Uses the same terms as was being used for the __builtin_prefetch arguments.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120709
gcc/ChangeLog:
* builtins.cc (expand_builtin_crc_table_based): Error out
instead of asserting the 3rd argument is an integer constant.
* internal-fn.cc (expand_crc_optab_fn): Likewise.
* doc/extend.texi (crc): Document requirement of the poly argument
being a constant.
gcc/testsuite/ChangeLog:
* gcc.dg/crc-non-cst-poly-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>