libstdc++-v3/Changelog:
* config/abi/pre/gnu.ver: Remove recent symbol swept up in
GLIBCXX_3.4.21.
* include/bits/basic_string.tcc (reserve): Guard __limit decl
under #if __cpp_exceptions to quiet warning.
Jakub Jelinek [Wed, 20 May 2026 06:49:06 +0000 (08:49 +0200)]
i386: Use vpaddq + vpermilpd for some non-const permutations [PR125357]
On Tue, May 19, 2026 at 10:30:16AM +0200, Jakub Jelinek wrote:
> On Tue, May 19, 2026 at 10:51:37AM +0300, Alexander Monakov wrote:
> > Thanks for looking at the issue, I really appreciate it. The same problem
> > exists with 64-bit lanes (V2DF/V2SI modes, we fail to utilize vpermilpd).
>
> The control in that case is in bits 1 and 65 rather than 0 and 64.
> So, in order to use vpermilpd for
> __builtin_shuffle (v2di_or_v2df, v2di);
> one would need to first shift the mask (or vpaddq with itself).
> Though, that is still shorter than what we emit right now.
PR target/125357
* config/i386/i386-expand.cc (ix86_expand_vec_perm): For TARGET_AVX
one_operand_shuffle handle also V2DImode and V2DFmode using
vpaddq and vpermilpd.
* gcc.target/i386/avx-pr125357-2.c: New test.
* gcc.target/i386/avx2-pr125357-2.c: New test.
liuhongt [Mon, 5 Jan 2026 02:52:23 +0000 (18:52 -0800)]
Limit outer-loop unswitching by duplicated code size
When unswitching predicates from the innermost loop, hoisting the
unswitch out to an outer loop duplicates only the outer-loop bodies;
the innermost loop is shared. Estimate the cost as
candidate_size - innermost_size and stop selecting an outer loop once
that exceeds param_max_unswitch_insns. innermost_size is hoisted out
of the walk so estimate_loop_insns is called once per level.
gcc/ChangeLog:
* tree-ssa-loop-unswitch.cc (estimate_loop_insns): New function.
(init_loop_unswitch_info): Do not select an outer loop for
unswitching when the duplicated outer-body size exceeds
param_max_unswitch_insns.
Zhou Qiankang [Mon, 18 May 2026 08:03:02 +0000 (16:03 +0800)]
LoongArch: Fix missing plugin header for cpu-features.h [PR125362]
When compiling GCC plugins that include target headers on LoongArch,
the build fails because cpu-features.h is not installed during
`make install-plugin`. The header, included by loongarch-protos.h,
was not listed in any variable that feeds PLUGIN_HEADERS. Add it to
TM_H, following the same approach used by i386 for i386-cpuinfo.h.
Kito Cheng [Tue, 19 May 2026 10:33:52 +0000 (18:33 +0800)]
testsuite: Update CRC dump scan regex for reversed crc table
Before r17-567 ("middle-end: Optimize reversed CRC table-based
implementation"), a reversed CRC without a crc_rev optab was
expanded as: reflect the input, run the non-reversed CRC table,
then reflect the result. So the table that got emitted was the
non-reversed one, and the dump printed:
;; emitting crc table crc_<N>_polynomial_<P> ...
After that patch, the middle end builds a reversed CRC table
directly and the dump prints:
Any target without a crc_rev optab hits the new path. For example,
RISC-V's bitmanip.md defines crc_rev<ANYI1:mode><ANYI:mode>4 with
no TARGET_ guard, so the optab is always there and the dump still
says "using optab for ..." -- the test passes. But x86's
crc_rev<SWI124:mode>si4 needs TARGET_CRC32, so a default x86 build
(no SSE4.2 / -march=cascadelake) has no optab, falls back to the
table, and prints the new "emitting reversed crc table" message.
The old regex only accepted "emitting crc table", so x86 fails.
These tests only call __builtin_rev_crc*, which always take the
reversed path, so the non-reversed "emitting crc table" message
cannot appear here. Just replace it with the new wording.
gcc/testsuite/ChangeLog:
* gcc.dg/crc-builtin-rev-target32.c: Match the new
"emitting reversed crc table" dump message.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
David Malcolm [Tue, 19 May 2026 19:04:30 +0000 (15:04 -0400)]
analyzer: fix pointer comparisons [PR125304]
PR analyzer/125304 describes a false positive from -fanalyzer on a
trivial use of std::string, due to the analyzer getting confused
about the paths for the small-string optimization versus heap-allocated
strings.
The root cause is a bug in region_svalue::eval_condition which handles
many kinds of pointer comparison, but which seems to have often been
hidden by the optimizer. Previously, it simply compared for identity
of the underlying "region" instance, returning true if identical, false
otherwise. This is wrong:
(a) for some cases, including the above one, different "region" instances
might represent the same memory (and thus we were returning "false" when
we should have returned "true")
(b) for some cases, different "region" instances we might not be able to
determine if they are the same address (and thus we were returning "false"
when we should have returned "unknown")
This patch rewrites region_svalue::eval_condition so that rather than
comparing the regions by identity, it compares their region_offset
values, taking into account their base regions and byte offsets within
those regions. Doing so requires using store::eval_alias, and so the
patch extends that to handle more cases precisely.
This new implementation fixes (a) and (b) above. There are some cases
where precision could be improved (where with the patch we return "unknown"
when we ought to return a known bool), but fixing these would be more
invasive and so are left to followup work.
gcc/analyzer/ChangeLog:
PR analyzer/125304
* common.h (compare_bit_offsets_p): New decl.
(eval_region_offset_comparison): New decl.
* region-model.cc (region_model::on_assignment): Pass *this to
store::set_value to help determination of aliasing.
(region_model::set_value): Likewise.
(region_model::eval_condition): Likewise for
region_svalue::eval_condition.
* region.cc (compare_bit_offsets_p): New.
(region_offset::dump_to_pp): Dump the base region, wrapping the
whole thing in braces.
(eval_byte_offset_comparison): New.
(eval_region_offset_comparison): New.
* store.cc (store::set_value): Add "model" param and pass it to
eval_alias.
(store::eval_alias): Add "model" param and pass to eval_alias_1.
Add early return of true when checking a base region against
itself. Replace final return of "unknown" with logic that
compares the kinds of the two base regions, and may be able
to return "false" rather than "unknown".
(store::eval_alias_1): Add "model" param and pass to eval_alias.
Assert that we have two different base_regions.
(store::replay_call_summary_cluster): Pass model to set_value.
* store.h (store::set_value): Add "model" param.
(store::eval_alias): Likewise.
(store::eval_alias_1): Likewise.
* svalue.cc (region_svalue::eval_condition): Likewise.
Reimplement in terms of eval_region_offset_comparison.
* svalue.h (region_svalue::eval_condition): Add "model" param.
gcc/testsuite/ChangeLog:
PR analyzer/125304
* c-c++-common/analyzer/pointer-comparison-pr125304-eq.c: New test.
* c-c++-common/analyzer/pointer-comparison-pr125304-ge.c: New test.
* c-c++-common/analyzer/pointer-comparison-pr125304-gt.c: New test.
* c-c++-common/analyzer/pointer-comparison-pr125304-le.c: New test.
* c-c++-common/analyzer/pointer-comparison-pr125304-lt.c: New test.
* g++.dg/analyzer/pointer-casts-pr125304.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Marek Polacek [Thu, 14 May 2026 13:27:17 +0000 (09:27 -0400)]
c++: capture of reference to global in template [PR123536]
Thanks to DR 696 (r253266), this works:
int g;
void fn ()
{
int &c = g;
auto l = [] { c++; };
l();
}
because `c` in the lambda body is not an odr-use because we can
evaluate it to a constant and so there's no capture. But when
fn is a template, we reject the code and crash. This patch fixes
both.
Outside a template, the call to maybe_constant_value in mark_use
evaluates `c` to `(int&) &g` but in a template, it remains `c`.
Then we emit an error, and crash on the error_mark_node from
process_outer_var_ref. One of the reasons is
else if (TYPE_REF_P (TREE_TYPE (expression)))
/* FIXME cp_finish_decl doesn't fold reference initializers. */
return true;
in value_dependent_expression_p but even if that changed, we still
wouldn't get the referent because decl_really_constant_value wouldn't
give it to us; the DECL_INITIAL is not a TREE_CONSTANT yet.
So I stopped trying to make this work in a template, and instead
I'm deferring the error in process_outer_var_ref to instantiation
when it's instantiation-dependent. The VAR_P check there is not
to regress the diagnostic in pr57416.C.
The mark_use hunk is to fix a crash on invalid (lambda-const14.C).
PR c++/123536
gcc/cp/ChangeLog:
* cp-tree.h (process_outer_var_ref): Remove a parameter's name.
* expr.cc (mark_use): Return if mark_rvalue_use returns
error_mark_node.
* semantics.cc (process_outer_var_ref): Return decl when it is
instantiation-dependent.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-const12.C: New test.
* g++.dg/cpp0x/lambda/lambda-const13.C: New test.
* g++.dg/cpp0x/lambda/lambda-const14.C: New test.
* g++.dg/template/local11.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Patrick Palka [Tue, 19 May 2026 17:43:10 +0000 (13:43 -0400)]
libstdc++: Fix incorrect move in flat_map::_M_try_emplace [PR125374]
PR libstdc++/125374
libstdc++-v3/ChangeLog:
* include/std/flat_map (_Flat_map_impl::_M_try_emplace): Forward
instead of unconditionally moving __k when inserting it.
* testsuite/23_containers/flat_map/1.cc (test10): New test.
AArch64: Add scalar-to-vector costs for vec_construct
An anti-pattern found in compiled code when predicated tails were
enabled for basic block SLP vectorization was triggered by
byte-reversing patterns in source code, such as:
One reason is that the SLP pass runs before the store-merging
pass gets a chance to coalesce 4 stores into 1 and substitute a
32 bit bswap implementation. Even ignoring that, costing of the
vectorized version (cost: 4) compared to the scalar version
(also 4) was not realistic:
_2 1 times vector_store costs 1 in body
node 0x32ee6d0 1 times vec_construct costs 3 in prologue
There were a couple of contributing issues:
1. the cost of mask construction for the vector_store (ptrue) was
omitted for BB SLP, whereas the loop vectorizer explicitly charges
for it.
2. the cost of vec_construct (elements / 2 + 1) did not incorporate
any GPR-to-SIMD register transfer costs (mov, fmov).
Since the supposed cost of the vectorised code only just reached parity
with the scalar code, addressing either of the above issues would be
sufficient to prevent vectorisation (in this specific case). It is also
less risky than changing the order of passes, and less hacky than
teaching the SLP pass about store-merging.
This commit addresses only the second issue, by adding code in
vector_costs::add_stmt_cost to charge scalar_to_vec_cost for each
element of an external def of kind vec_construct (with specific
exceptions noted below). This cost is added to the base cost
already charged by aarch64_builtin_vectorization_cost for a
vec_construct (which is assumed to cover the cost of the INSR or
equivalent instructions).
This is justifiable because SIMD-to-SIMD insertions into a vector
register generally have lower latency and higher throughput than
GPR-to-SIMD insertions.
The basic structure of the code was copied from commit 90d693bdc9d71841f51d68826ffa5bd685d7f0bc which modified the x86
backend in a similar way, but adapted to use a hash_set<tree>
instead of TREE_VISITED to guard against charging twice or more for
the same scalar op feeding an external def.
This commit assumes that constructing a vector from memory
is no more costly than the equivalent set of scalar loads (or at least
that any difference is incorporated in the cost returned by
aarch64_builtin_vectorization_cost for vec_construct). It also assumes
that constructing a vector from scalar values of floating point type,
from a BIT_FIELD_REF/lastb that extracts from a vector register, or
from the result of a call to an inbuilt reduction function, does not
incur GPR-to-SIMD register transfer costs because such scalars are
typically already in FP/SIMD registers on AArch64.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_call_scalar_result_in_simd_reg_p):
New function to determine probabilistically whether a gcall
produces a scalar result in a SIMD/FP register.
(aarch64_scalar_op_to_vec_p): New function to determine
whether or not to add scalar_to_vec_cost per scalar operand
from which a vector is to be constructed.
(aarch64_external_adjust_stmt_cost): New function to adjust the
cost of an SLP tree node for a vec_construct that is fed by
values defined outside the vectorized region.
(aarch64_vector_costs::add_stmt_cost): Call the new
aarch64_external_adjust_stmt_cost function if we have an SLP
node and a vector type.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vec_construct_1.c: New test.
* gcc.target/aarch64/sve/vec_construct_2.c: New test.
* gcc.target/aarch64/sve/vec_construct_3.c: New test.
* gcc.target/aarch64/sve/vec_construct_4.c: New test.
* gcc.target/aarch64/sve/vec_construct_5.c: New test.
* gcc.target/aarch64/vec-construct-1.c: New test.
* gcc.target/aarch64/vec-construct-10.c: New test.
* gcc.target/aarch64/vec-construct-11.c: New test.
* gcc.target/aarch64/vec-construct-12.c: New test.
* gcc.target/aarch64/vec-construct-2.c: New test.
* gcc.target/aarch64/vec-construct-3.c: New test.
* gcc.target/aarch64/vec-construct-4.c: New test.
* gcc.target/aarch64/vec-construct-5.c: New test.
* gcc.target/aarch64/vec-construct-6.c: New test.
* gcc.target/aarch64/vec-construct-7.c: New test.
* gcc.target/aarch64/vec-construct-8.c: New test.
* gcc.target/aarch64/vec-construct-9.c: New test.
libstdc++: Use allocate_at_least in vector, string (P0401) [PR118030]
Implement as much of allocator<>::allocate_at_least as possible
relying solely on known alignment behavior of standard operator
new.
Use allocator_at_least in string and vector to maximize usage of
actually allocated storage, as revealed by the allocator in use.
For user-supplied allocators this may make a big difference.
Nothing is changed in include/ext/malloc_allocator or others.
They can be updated at leisure, piecemeal.
libstdc++-v3/ChangeLog:
PR libstdc++/118030
* config/abi/pre/gnu.ver: Expose string::_S_allocate_at_least,
_M_create_plus symbols.
* include/bits/alloc_traits.h:
(allocate_at_least): Delegate in allocator_traits<allocator<_Tp>>
specialization to allocator<_Tp>::allocate_at_least, unconditionally;
annotate [[__gnu__::always_inline__]].
(allocate_at_least): Declare "= delete;" in allocator<void>.
* include/bits/allocator.h (allocate_at_least): Delegate to base
allocate_at_least where defined, calling with explicit base-class
qualification, picking up __new_allocator member.
* include/bits/basic_string.h:
(_Alloc_result): Define new type.
(_S_allocate_at_least): Define, using it.
(_S_allocate): Minimize for legacy ABI use only.
(_M_create_plus): Declare.
(_M_create_and_place): Define, abstracting common operations.
(assign): Use _S_allocate_at_least.
* include/bits/basic_string.tcc:
(_M_create_plus): Define.
(_M_replace, reserve): Use _S_allocate_at_least.
(_M_construct, others (3x)): Use _M_create_and_place.
(_M_construct, input iterators): Use _M_create_plus.
(_M_create, _M_assign, reserve, _M_mutate): Same.
* include/bits/memory_resource.h (allocate_at_least): Define,
document.
* include/bits/new_allocator.h (allocate_at_least): Define.
(_S_check_allocation_limit) Define.
(allocate): Use _S_check_allocation_limit.
(_S_max_size): Change from _M_max_size.
(deallocate): Refine "if constexpr" logic.
* include/bits/stl_vector.h:
(_S_max_size): Move to _Vector_base.
(_Alloc_result): Define type.
(_M_allocate_at_least): Define, using allocate_at_least where supported.
(_M_allocate): Delegate to _M_allocate_at_least.
(max_size, _S_check_init_len): Use _S_max_size as moved.
(_M_create_storage, append_range, _M_allocate_and_copy,
_M_replace_storage): Define, abstracting common operations.
(_M_replace_with): Define, likewise.
(_M_range_initialize_n): Use _M_allocate_at_least.
(_M_check_len): Improve logic.
* include/bits/vector.tcc:
(reserve, _M_fill_append, _M_range_insert): Use _M_allocate_at_least
and _M_replace_storage.
(operator=, _M_assign_aux): Use _M_replace_with.
(_M_realloc_insert, _M_realloc_append, _M_default_append, insert_range):
Use _M_allocate_at_least.
(_M_fill_insert): Use _M_replace_storage, normalize whitespace.
* testsuite/util/testsuite_allocator.h:
(allocate_at_least (3x)): Define.
(allocate): Use allocate_at_least.
* testsuite/20_util/allocator/allocate_at_least.cc: Add tests.
* testsuite/21_strings/basic_string/capacity/char/18654.cc:
Loosen capacity check.
* testsuite/21_strings/basic_string/capacity/char/shrink_to_fit.cc:
Same.
* testsuite/21_strings/basic_string/capacity/wchar_t/18654.cc: Same.
* testsuite/21_strings/basic_string/capacity/wchar_t/2.cc: Same.
* testsuite/21_strings/basic_string/capacity/wchar_t/shrink_to_fit.cc:
Same.
* testsuite/23_containers/vector/capacity/shrink_to_fit.cc: Same.
* testsuite/23_containers/vector/capacity/shrink_to_fit2.cc: Same
* testsuite/23_containers/vector/modifiers/emplace/self_emplace.cc:
Adapt to looser reserve behavior.
OpenMP 5.0: Allow multiple clauses mapping same variable
This patch allows multiple clauses on the same construct to map the same
variable, which was not valid in OpenMP 4.5, but allowed in 5.0.
Internally, map clauses have to be deduplicated or merged before reaching the
topological sort in gimplify.cc, lest they might result in a cycle. This happens
in two places: first in the respective front-ends before any clause expansion,
then in the gimplifier just before grouping. The second pass is necessary due to
early clause expansion in the FE reintroducing some duplication (see
map-multi-2.f90).
To make duplicate detection and folding easier in Fortran, enum gfc_omp_map_op
is adjusted to have the two least signficant bits mapped to FROM and TO, similar
to gomp_map_kind in gomp-constants.h
This version of the patch only allows multiple clauses mapping the same variable
on OpenMP code; similar OpenACC code will still be rejected (for now). It also
fixes some minor issues: allow array section bounds to be null and run
target-map-multi-2 only on offload device.
gcc/c/ChangeLog:
* c-typeck.cc (c_finish_omp_clauses): Call omp_remove_duplicate_maps
before clause expansion.
gcc/cp/ChangeLog:
* semantics.cc (finish_omp_clauses): Likewise.
gcc/ChangeLog:
* fold-const.cc (operand_compare::operand_equal_p): Handle
OMP_ARRAY_SECTION.
* gimplify.cc (gimplify_scan_omp_clauses): Call
omp_remove_duplicate_maps after partial clause expansion.
* omp-general.cc (omp_remove_duplicate_maps): New function.
* omp-general.h (omp_remove_duplicate_maps): Declare.
* omp-low.cc (install_var_field): Add new 'tree key_expr = NULL_TREE'
default parameter. Set splay-tree lookup key to key_expr instead of
var if key_expr is non-NULL. Adjust call to install_parm_decl.
Update comments.
(scan_sharing_clauses): Use clause tree expression as splay-tree key
for map/to/from and OpenACC firstprivate cases when installing the
variable field into the send/receive record type.
(lower_oacc_reductions): Adjust to find map-clause of reduction
variable, then create receiver-ref.
(lower_omp_target): Adjust to lookup var field using clause expression.
gcc/fortran/ChangeLog:
* gfortran.h (enum gfc_omp_map_op): Dedicate the two LSB to TO and FROM.
* openmp.cc (resolve_omp_clauses): Adjust to allow duplicate
mapped variables for OpenMP.
* trans-openmp.cc (gfc_trans_omp_clauses): Remove duplicates before
clause expansion.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-multi-1.C: New test.
* testsuite/libgomp.c-c++-common/target-map-iterators-6.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-1.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-2.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-3.c: New test.
* testsuite/libgomp.c-c++-common/target-map-multi-4.c: New test.
* testsuite/libgomp.fortran/target-map-multi-1.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-2.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-3.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-4.f90: New test.
* testsuite/libgomp.fortran/target-map-multi-5.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-2.c: Adjust testcase.
* c-c++-common/gomp/map-6.c: Adjust testcase.
* gfortran.dg/gomp/pr107214.f90: Adjust testcase.
* c-c++-common/gomp/map-multi-1.c: New test.
* c-c++-common/gomp/map-multi-2.c: New test.
* gfortran.dg/gomp/map-multi-1.f90: New test.
* gfortran.dg/gomp/map-multi-2.f90: New test.
Thomas Koenig [Tue, 19 May 2026 12:09:35 +0000 (14:09 +0200)]
PR fortran/115260 - fix data corruption on inline packing/unpacking
This patch fixes a data corruption occuring when a non-contiguous slice of an
allocatable array component was passed to a procedure expecting a g77-style
argument. The problem was the inline packing (PR fortran/88821) which went
astray gfc_trans_scalar_assign was told to deallocate the argument upon
return.
The solution was to not pass that argument if passing a g77-style array,
in effect a one-liner.
This is a regression which goes back to all supported releases.
gcc/fortran/ChangeLog:
PR fortran/115260
* trans-expr.cc (gfc_conv_subref_array_arg): Pass false to
dealloc argument of gfc_trans_scalar_assign if we are
converting a g77-style argument.
gcc/testsuite/ChangeLog:
PR fortran/115260
* gfortran.dg/pr115260.f90: New test.
Georg-Johann Lay [Tue, 19 May 2026 11:33:00 +0000 (13:33 +0200)]
AVR: Add bitreverseqi2 insns.
Now that https://gcc.gnu.org/r17-591 has been applied, the
middle-end will express 8-bit bitreverse code in terms of
a 16-bit bitreverse. Therefore, add bitreverseqi2 insns.
PR target/50481
gcc/
* config/avr/avr.md (bitreverseqi2): New insn-and-split.
(*bitreverseqi2): New insn.
Roger Sayle [Tue, 19 May 2026 11:29:08 +0000 (07:29 -0400)]
i386: Optimize ptestz(x,-1) as ptestz(x,x) on x86
This patch, inspired by PR target/90483 and libstdc++/118416, implements
some RTL expansion-time simplifications of ptest. A common idiom for
testing a vector against zero is to use ptestz(mask,-1). Alas the code
generated for this is suboptimal, requiring materialization of an all_ones
vector. Given that ptestz(x,y) is defined as (x & y) == 0, an equivalent
form is ptestz(mask,mask), saving an instruction (if ~0 isn't available).
Consider the function:
typedef long long v2di __attribute__ ((__vector_size__ (16)));
int foo (v2di x)
{
return __builtin_ia32_ptestz128(x,~(v2di){0,0});
}
foo: xorl %eax, %eax
vptest %xmm0, %xmm0
sete %al
ret
2026-05-19 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/90483
PR libstdc++/118416
* config/i386/i386-expand.cc (ix86_expand_sse_ptest): Refactor
with optimizations for PTESTZ*, PTESTC* and PTESTNZC*, including
transforming ptestz(x,-1) into ptestz(x,x).
gcc/testsuite/ChangeLog
PR target/90483
PR libstdc++/118416
* gcc.target/i386/sse4_1-ptest-8.c: New test case.
* gcc.target/i386/sse4_1-ptest-9.c: Likewise.
arm: Fix MVE load/store with writeback intrinsics [PR124870]
These intrinsics (vldr*_gather_base_wb, vstr*_scatter_base_wb) lacked
modelling of memory accesses corresponding to writeback: in this case,
they both read and write memory.
Jonathan Wakely [Mon, 18 May 2026 14:44:21 +0000 (15:44 +0100)]
libstdc++: Make chrono::parse fail for bad %z [PR125369]
The chrono parsing code failed to check for errors when parsing input to
match %z. The expected input is [+-]hh[mm] but if we read less than two
valid digits for the hh or mm parts we didn't set failbit in the stream,
and used the -1 error values returned for each bad digit in the offset
value. This resulted in a "successful" parse that produced a value like
-11h or -11min for the time zone offset.
libstdc++-v3/ChangeLog:
PR libstdc++/125369
* include/bits/chrono_io.h (__detail::_Parser::operator()):
Check for errors when parsing digits for a %z format.
* testsuite/std/time/parse/125369.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Andrew Pinski [Sun, 17 May 2026 21:32:33 +0000 (14:32 -0700)]
tree: Move unshare_expr from gimplifier to generic tree
We use unshare_expr in many places now outside of gimple
even. So it makes sense to move the decl to tree.h.
A few sources can now even not need to include gimplify.h;
I have not checked all of them just a few which seemed
like including gimplify.h didn't make sense.
This also moves the implementations of unshare_expr,
unshare_expr_without_location and copy_if_shared from gimplify.cc
to tree.cc to keep the headers "clean".
Jakub Jelinek [Tue, 19 May 2026 08:11:08 +0000 (10:11 +0200)]
i386: Use vpermilps for some non-const permutations [PR125357]
We don't use vpermilps insn for V4S[IF]mode variable permutations on
TARGET_AVX without TARGET_AVX512*. For TARGET_AVX512* there are plenty
of permutation instructions already. For TARGET_AVX2, the function has
special cases for one_operand_shuffle for V8SImode/V8SFmode and emits
reasonable code, but for V4SImode/V4SFmode with TARGET_AVX2 it handles
those using V8SImode/V8SFmode as two operand shuffle, which requires
2 preparation instructions, vpermd and one finalization instruction.
And for !TARGET_AVX2 && TARGET_AVX we just emit terrible code for these.
So, the following patch uses vpermilps for V4S[IF]mode one_operand_shuffle.
Trying to handle V8S[IF]mode is not worth it, for TARGET_AVX2 we already
emit good code (see above) and for !TARGET_AVX2 && TARGET_AVX V8SImode
mask is not valid vector mode, so we emit terrible code no matter what.
2026-05-19 Jakub Jelinek <jakub@redhat.com>
PR target/125357
* config/i386/i386-expand.cc (ix86_expand_vec_perm): For
one_operand_shuffle if TARGET_AVX and not TARGET_AVX512F use
vpermilps for V4SImode/V4SFmode. Formatting fix.
* gcc.target/i386/avx-pr125357.c: New test.
* gcc.target/i386/avx2-pr125357.c: New test.
Jakub Jelinek [Tue, 19 May 2026 07:29:33 +0000 (09:29 +0200)]
optabs: Handle bitreverse using widening or two bitreverses of halves [PR50481]
The following patch extends the widen_bswap and expand_doubleword_bswap
functions to handle also bitreverse, so that all the backends with
say just bitreversesi2 or bitreverse{s,d}i2 can handle also
bitreverse{q,h}i2 and bitreverse{d,t}i2 easily.
2026-05-19 Jakub Jelinek <jakub@redhat.com>
PR target/50481
* optabs.cc (widen_bswap): Add UNOPTAB argument and use it instead
of hardcoded bswap_optab. Rename to ...
(widen_bswap_or_bitreverse): ... this.
(expand_doubleword_bswap): Add UNOPTAB argument and use it instead
of hardcoded bswap_optab. Rename to ...
(expand_doubleword_bswap_or_bitreverse): ... this.
(expand_bitreverse): Use widen_bswap_or_bitreverse and
expand_doubleword_bswap_or_bitreverse.
(expand_unop): Adjust widen_bswap and expand_doubleword_bswap callers
to use new names and add an extra bswap_optab argument.
Reviewed-by: Jeffrey Law <jeffrey.law@oss.qualcomm.com>
Robin Dapp [Wed, 13 May 2026 18:39:13 +0000 (20:39 +0200)]
RISC-V: Guard 64-bit vec_extract.
Currently, reduc-6.c fails on the trunk when compiling for 32 bit.
We emit a pred_extract_first of a V2DImode during legitimization of
a move. Normally, we would split that insn into two 32-bit extracts
but this splitter needs to be able to create pseudos which it can't
after reload. The insn here is created during reload when we can still
create pseudos. This patch just piggybacks on the existing handling
when no 64-bit vector elements are available (!TARGET_VECTOR_ELEN64).
Thus, we don't emit 64-bit extracts and don't need to rely on splitting
late.
PR target/125097
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Emit 32-bit
vec_extracts right away.
Robin Dapp [Fri, 23 Jan 2026 15:15:42 +0000 (16:15 +0100)]
RISC-V: Remove cbranch_all patterns.
When first introducing the cbranch_any/_all patterns I messed up the all
pattern. After giving it more thought, I removed the patterns entirely.
Our current early-break handling via autovec-opt.md is not ideal but
similar to what the patterns would give us, so no need for confusion.
The situation will improve anyway once the no-scalar-epilogue early-break
patches land.
While at it, I tried unifying the int and float comparison emitter
functions. The latter now also has "mask" capabilities.
Jeevitha [Tue, 19 May 2026 02:58:50 +0000 (21:58 -0500)]
rs6000: Adding missed ISA 3.0 atomic memory operation instructions
Changes to amo.h include the addition of the following load atomic
operations: Compare and Swap Not Equal, Fetch and Increment Bounded,
Fetch and Increment Equal, and Fetch and Decrement Bounded.
Additionally, Store Twin is added for store atomic operations.
2026-05-19 Peter Bergner <bergner@linux.ibm.com>
Jeevitha Palanisamy <jeevitha@linux.ibm.com>
Andi Kleen [Mon, 18 May 2026 15:40:53 +0000 (08:40 -0700)]
Fix masm ptwrite again
The earlier 64bit ptwrite as 32bit fix broke Intel syntax output. Handle
that too by using an alternative. In Intel syntax the instruction
data type is defined by the operands.
I'll commit it as obvious in a day or so for 15/16/trunk, unless there
are objections.
PR target/125351
gcc/ChangeLog:
* config/i386/i386.md: Use alternative to handle masm ptwrite
syntax.
Andrew Pinski [Mon, 18 May 2026 23:30:12 +0000 (16:30 -0700)]
testsuite: Fix pr112095.c for veclowering
Basically we need to test earlier in release_ssa instead
of optimization which is before vec lowering happens.
Also "return a_" to be expanded to match "<retval> = a_"
for vector types that return via memory.
Also add -Wno-psabi to avoid a warning/note about the vector
argument.
Pushed as obvious after testing on x86_64-linux-gnu with both
-m64 and -m32/-mno-sse to invoke the cases that matter here.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr112095.c: Add -Wno-psabi to the options.
Look at release_ssa instead of optimization. Match
"<retval> = a_" in addition to "return a_".
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jonathan Wakely [Mon, 18 May 2026 22:38:19 +0000 (23:38 +0100)]
libstdc++: Move std::bitset test to correct directory
I added this test in r16-3435-gbbc0e70b610f19 but I'd previously moved
the rest of the bitset tests under 20_util, in r13-2778-g4b4b51445f7f3d.
This moves the lwg4294.cc test where it belongs.
Xin Wang [Mon, 18 May 2026 22:00:15 +0000 (16:00 -0600)]
[PATCH v2] tree-optimization: Fix profile update in loop splitting (initial_true=false)
When split_loop does iteration space splitting, split_at_bb_p may
swap the guard condition so that operand 0 is always the loop IV
and operand 1 is the invariant. For example, "t < i" (LT_EXPR)
becomes "i > t" (GT_EXPR). This can cause initial_true to be
false, meaning loop1 handles iterations where the guard is false
and loop2 handles iterations where the guard is true.
The profile update code used true_edge->probability for loop1 and
its inverse for loop2. That is correct only when loop1 keeps the
true branch. When initial_true is false, loop1 keeps the false
branch and loop2 keeps the true branch, so all profile quantities
must follow those semantic edges instead of the raw true/false edge
names.
Derive loop1_edge and loop2_edge once from initial_true and use them
consistently for loop_version's then probability, the split-loop BB
count scaling, and the iteration estimate scaling. Also make
fix_loop_bb_probability take loop1/loop2 edges explicitly, rather
than assuming its arguments are true/false edges.
The bug caused BB counts in the split loops to be swapped when
initial_true is false: the loop body whose guard is forced false
(loop1, executing fewer iterations) would get the higher profile
count, and vice versa. It could also leave the precondition edge
probabilities and iteration estimates based on the wrong split edge.
gcc/ChangeLog:
* tree-ssa-loop-split.cc (fix_loop_bb_probability): Rename
parameters from true_edge/false_edge to loop1_edge/loop2_edge
and scale both loops directly from their semantic edge
probabilities.
(split_loop): Derive loop1_edge and loop2_edge from initial_true.
Use them for loop1_prob, fix_loop_bb_probability, and iteration
estimate scaling.
Jeff Law [Mon, 18 May 2026 21:24:02 +0000 (15:24 -0600)]
[RISC-V] Improve SI->DI zero/sign extension patterns for RISC-V
-- From the original submission in Oct 2025 --
So this is a slightly scaled back variant of a patch I've been working
on. I'd originally planned to handle both zero and sign extensions, but
there's some fallout with the sign extension adjustments that I'm going
to need more time to tackle. This piece stands on its own and unlocks a
subsequent patch to improve codegen. No sense in having it possibly
miss the merge window.
This patch adjusts the core zero-extension patterns as well as one
closely related combiner pattern.
For the named expanders, we now generate shift pairs if the Zba/Zbb
extensions are not available and the source operand is a REG. Things
are kept as-is for MEMs.
The existing define_insn_and_split it turned into a define_insn that
only handles MEM sources. Those instructions are always available, so
no need to mess with shift pairs. This avoids regressions with a
follow-up patch which enhances a closely related combiner pattern.
That closely related combiner pattern is a define_insn_and_split which
can now turn into a simpler define_split. So that's adjusted as well.
The net is we drop 3 define_insn_and_splits and occasionally get better
code as a result. It also makes it possible to improve some additional
cases which I'll handle as a followup.
The test changes are minimal and mostly related to making sure we have
the right Zb* things enabled based on what the test relies on under the
hood. It's not even clear that part of the change is strictly necessary
anymore. I see it more as test hygiene than anything.
This has been bootstrapped and regression tested on the Pioneer which is
a good test since it doesn't have any of the Zb* extensions and thus
relies heavily on the shift-pair approach to zero extensions.
riscv32-elf and riscv64-elf have also been regression tested. The BPI
hasn't started chewing on this patch yet.
--
Subsequent changes were to the testsuite to ensure that --with-cpu or
--with-tune configure time options wouldn't impact the testresults.
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Properly cost pack insns
for Zbkb.
* config/riscv/riscv.md (zero_extendsidi2): Expand into shift pairs
when the appropriate instructions are not available.
(zero_extendhi<GPR:mode>2): Simlarly.
(*zero_extendsidi2_internal): Make a simple define_insn. Only handle
MEM sources.
(*zero_extendhi<GPR2:mode>2): Similarly.
(zero_extendsidi2_shifted): Turn into a define_split and generalize
to handle more constants.
* config/riscv/predicates.md (dimode_shift_operand): New predicate.
gcc/testsuite/
* gcc.target/riscv/slt-1.c: Skip for -Oz as well. Set explicit branch
cost.
* gcc.target/riscv/zba-shNadd-04.c: Add Zbb to command line switches.
* gcc.target/riscv/zba-slliuw.c: Add Zbs to command line switches.
* gcc.target/riscv/zbs-zext.c: Add Zbs to command line switches.
* gcc.target/riscv/shift-shift-6.c: New test.
* gcc.target/riscv/shift-shift-7.c: New test.
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/amo/zalasr-rvwmo-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/amo/zalasr-ztso-load-relaxed.c: Accept lh or lhu.
* gcc.target/riscv/pr105314.c: Set explicitly branch cost.
* gcc.target/riscv/pr105314-rtl.c: Set explicitly branch cost.
Jeff Law [Mon, 18 May 2026 21:17:27 +0000 (15:17 -0600)]
[RISC-V] Improve ext-dce's live bit tracking for IOR/AND with a constant argument
Investigation of a regression with some RISC-V target changes exposed a clear
missed optimization in ext-dce.c
In particular if we mask off bits via a logical AND, then the masked off bits
are not live-in for the other input. Tracking that can in turn allow us to
eliminate more extensions. There's a similar case for logical IOR when it
unconditionally turns bits on.
So if we look at this testcase:
typedef long unsigned int size_t;
struct function
{
unsigned int curr_properties;
unsigned int last_verified;
};
extern struct function *cfun;
andi a5,a0,32
sext.w a0,a0
beq a5,zero,.L2
bseti a0,a0,15
.L2:
lui a5,%hi(cfun)
ld a5,%lo(cfun)(a5)
andi a0,a0,28
sw a0,4(a5)
ret
Note carefully the 2nd andi instruction. That's unconditionally turning off
bits 32..63 (and others). Thus those bits are not relevant/live for the
incoming value in a0. Walking backwards we find the sext.w which sign extends
from bit 31 into bits 32..63. But with bits 32..63 not being live, the sext.w
is useless.
After this patch we get:
andi a5,a0,32
beq a5,zero,.L2
bseti a0,a0,15
.L2:
lui a5,%hi(cfun)
ld a5,%lo(cfun)(a5)
andi a0,a0,28
sw a0,4(a5)
ret
It doesn't trigger often based on my quick testing.
Bootstrapped and regression tested on various targets including x86 and riscv.
Also tested on the usual assortment of embedded targets in my tester. I'll
wait for pre-commit CI to give a final verdict.
gcc/
* ext-dce.cc (carry_backpropagate): Handle AND and IOR with a constant argument.
This param name is confusing, due to it changing meaning in r16-6063-g0b786d961d4426. Update the name to better express what it now
controls, and clarify the wording of the warning that fires when that
limit is hit by the analyzer, and to give more info.
gcc/analyzer/ChangeLog:
* analyzer.opt (-param=analyzer-bb-explosion-factor=): Rename to...
(-param=analyzer-supernode-explosion-factor=): ...this.
* engine.cc (exploded_graph::process_worklist): Update for change
to param name. Clarify the -Wanalyzer-too-complex message when
hitting the overall limit on enodes by also showing the number
of snodes.
gcc/ChangeLog:
* doc/analyzer.texi: Update for change in param name.
* doc/params.texi: Likewise. Clarify wording.
gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/fibonacci.c: Update for change in message
wording.
* c-c++-common/analyzer/raw-data-cst-pr117262-1.c: Update for
change in param name.
* gcc.dg/analyzer/explode-2a.c: Likewise.
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: Likewise.
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 18 May 2026 20:18:38 +0000 (16:18 -0400)]
Modernize class optrecord_json_writer
No functional change intended.
gcc/ChangeLog:
* dump-context.h (dump_context::emit_optinfo): "info" is non-null,
so pass it by reference.
* dumpfile.cc (dump_context::end_any_optinfo): Update for above
change.
(dump_context::emit_optinfo): Likewise.
* optinfo-emit-json.cc: Update throughout to eliminate naked "new"
and "delete" in favor of std::make_unique and unique_ptr. Drop
redundant dtor. Use nullptr rather than NULL. Pass by
const-reference rather than by const-pointer in the places that
require non-null.
* optinfo-emit-json.h: Likewise.
* optinfo.cc (optinfo::emit_for_opt_problem): Update for above
changes.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jonathan Wakely [Sun, 17 May 2026 18:27:11 +0000 (19:27 +0100)]
libstdc++: Reduce iterations in PSTL test for Debug Mode
This test often times out, especially on machines with a large number of
cores when the tests are run with a lot of parallel jobs. I suspect that
TBB creates a lot of threads due to std::hardware_concurrency() being a
large number, but because most cores are already busy running other
tests (due to `make -jN check` with large N) the system gets
oversubscribed. In Debug Mode, the testcase runs much slower, and often
times out.
It's probably fine to just test with fewer iterations when Debug Mode is
active.
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Reduce iterations for debug mode.
Avi Kivity [Thu, 26 Feb 2026 17:59:41 +0000 (19:59 +0200)]
libstdc++: optimize std::uninitialized_move{,_n}() to memcpy when possible [PR121789]
std::uninitialized_move{,_n} delegates to the corresponding
std::uninitialized_copy() variant after wrapping with a move
iterator, but the std::uninitialized_copy() doesn't unwrap the
move iterator, therefore losing the memcpy optimization if the
iterators were just pointers.
Fix this by unwrapping the move iterator using __miter_base().
We remove operator-() in testsuite_greedy_ops.h; otherwise it breaks
the range size computation.
libstdc++v3/Changelog:
PR libstdc++/121789
* include/bits/stl_uninitialized.h (uninitialized_copy):
Unwrap move iterators
* testsuite/20_util/specialized_algorithms/uninitialized_move/121789.cc:
New test.
* testsuite/util/testsuite_greedy_ops.h (greedy_ops): Comment
out operator-(T, T).
Tomasz Kamiński [Thu, 14 May 2026 12:47:03 +0000 (14:47 +0200)]
libstdc++: Include range_access.h from <valarray>
This implements <valarray> related parts of section 4.8. of P3016R6.
This is treated as DR against C++11 (to expose array begin/end), to follow
similar changes to other semi-containers that were accepted as LWG issues
and treated as DR: <optional> (LWG4131), <stacktrace> (LWG3625).
libstdc++-v3/ChangeLog:
* include/std/valarray [__cplusplus >= 201103L]: Include
<bits/ranges_access.h>.
* testsuite/26_numerics/valarray/range_access2.cc: Remove
<iterator> include, and add test for std::size.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Yoshinori Sato [Mon, 18 May 2026 12:50:58 +0000 (06:50 -0600)]
[PATCH] RX: The size of the mov instruction will be corrected
> THanks. There's still work to do. I spun my tester after this change
> on the rx port:
>
> Tests that now fail, but worked before (431 tests):
>
> I won't list them all. Given how many are execution failures, there's
> likely a code generation failure in there somewhere.
> A few of them:
>
> rx-sim: gcc: gcc.c-torture/execute/20001009-2.c -O0 execution test
> rx-sim: gcc: gcc.c-torture/execute/20020614-1.c -O0 execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c -O1 execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c -O2 execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c -O2 -flto
> -fno-use-linker-plugin -flto-partition=none execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c -O3 -g execution test
> rx-sim: gcc: gcc.c-torture/execute/20050410-1.c -Os execution test
> rx-sim: gcc: gcc.c-torture/execute/921016-1.c -O0 execution test
> rx-sim: gcc: gcc.c-torture/execute/960311-1.c -O1 execution test
> rx-sim: gcc: gcc.c-torture/execute/960311-2.c -O1 execution test
> rx-sim: gcc: gcc.c-torture/execute/980617-1.c -O0 execution test
> rx-sim: gcc: gcc.c-torture/execute/990324-1.c -O0 execution test
> rx-sim: gcc: gcc.c-torture/execute/990326-1.c -O0 execution test
>
> Anyway, seems like something for Yoshinori to look into.
The code extension was causing incorrect output.
Optimization mitigated this issue, so I didn't notice it.
The attached changes now allow the test to pass.
When expanding `extendqisi2` or `extendhisi2`, incorrect operation size
instructions were sometimes output.
This update ensures that the operation size is determined reliably.
gcc/
* config/rx/rx.cc (rx_gen_move_template): Select the mode with the smallest size
for the mov instruction.
This patch generates a reversed polynomial lookup table directly,
eliminating the need for bit reflection operations. The new algorithm:
for (int i = 0; i < data_bit_size / 8; i++)
crc = (crc >> 8) ^ table[(crc ^ (data >> (i * 8))) & 0xFF];
This improves code generation for all targets using table-based reversed
CRC, as it removes the overhead of reflecting input data and CRC values.
Note on code size: one could imagine sharing a single (non-reversed) table
between programs that compute both reversed and non-reversed CRCs in order
to save space under -Os. A survey of ~10k Fedora packages (by Mariam and
Jeff Law) found no package that uses both flavors in the same binary, so
this case is not worth optimizing for.
Ref:
[1] "Reversing CRC - Theory and Practice"
https://sar.informatik.hu-berlin.de/research/publications/SAR-PR-2006-05/SAR-PR-2006-05_.pdf
Robin Dapp [Thu, 26 Mar 2026 15:09:11 +0000 (16:09 +0100)]
RISC-V: Remove blanket else in riscv_hard_regno_mode_ok.
While looking at PR124439 I noticed that we have unreachable code in
riscv_hard_regno_mode_ok. Right now we just return false for registers
that don't match one of the first four if conditions.
Robin Dapp [Thu, 26 Mar 2026 15:19:43 +0000 (16:19 +0100)]
RISC-V: Fix format specifier.
Right now we get
../../gcc/config/riscv/riscv.cc: In function ‘bool riscv_check_target_clone_version(string_slice, location_t*)’:
../../gcc/config/riscv/riscv.cc:15078:17: warning: unknown conversion type character ‘B’ in format [-Wformat=]
15078 | "invalid version %qB for %<target_clones%> attribute",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../gcc/config/riscv/riscv.cc:15078:17: warning: too many arguments for format [-Wformat-extra-args]
with a GCC 15 host compiler.
This patch replaces %qB with $<%.*s%>.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_check_target_clone_version): Fix
format specifier.
Tomasz Kamiński [Mon, 18 May 2026 07:45:46 +0000 (09:45 +0200)]
libstdc++: Make is_exhaustive const for layout_(left/right)_padded
This is specified as const in the standard, and required to be const-callable
per layout mapping requirement. This made calls to is_exhaustive on mdspan
with such layout ill-formed.
libstdc++-v3/ChangeLog:
* include/std/mdspan (layout_left_padded::is_exhaustive)
(layout_righ_padded::is_exhaustive): Mark as const.
* testsuite/23_containers/mdspan/layouts/mapping.cc: Test noexcept and
const-invocability for is_exhaustive, is_strided, and is_unique.
* testsuite/23_containers/mdspan/layouts/padded.cc: Test is_exhaustive on
const mapping..
* testsuite/23_containers/mdspan/layouts/stride.cc: Likewise.
* testsuite/23_containers/mdspan/mdspan.cc: Checks const-invocability
for is_exhaustive, is_strided, is_unique.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Mon, 18 May 2026 07:41:59 +0000 (09:41 +0200)]
i386: Implement bitreverse<mode>2 optab for GFNI [PR50481]
The following patch implements the bitreverse<mode>2 optab for
-mgfni -msse2 (SSE2 because apparently -mgfni doesn't imply -msse nor
-msse2).
This is done by using gf2p8affineqb insn with a special constant
which reverses bits in each byte, and for modes wider than QImode
also by doing a byteswap afterwards.
With -m64 it emits
.LC0:
.byte 1, 2, 4, 8, 16, 32, 64, -128
.byte 1, 2, 4, 8, 16, 32, 64, -128
and
movd %edi, %xmm0
gf2p8affineqb $0, .LC0(%rip), %xmm0
movd %xmm0, %eax
for __builtin_bitreverse8,
movd %edi, %xmm0
gf2p8affineqb $0, .LC0(%rip), %xmm0
movd %xmm0, %eax
rolw $8, %ax
for __builtin_bitreverse16,
movd %edi, %xmm0
gf2p8affineqb $0, .LC0(%rip), %xmm0
movd %xmm0, %eax
bswap %eax
for __builtin_bitreverse32,
movq %rdi, %xmm0
gf2p8affineqb $0, .LC0(%rip), %xmm0
movq %xmm0, %rax
bswap %rax
for __builtin_bitreverse64, and
movq %rdi, %xmm0
pinsrq $1, %rsi, %xmm0
gf2p8affineqb $0, .LC0(%rip), %xmm0
movq %xmm0, %rax
pextrq $1, %xmm0, %rdx
bswap %rax
bswap %rdx
xchgq %rdx, %rax
for __builtin_bitreverse128 (only the xchgq is unnecessary
and surprising, some RA issue).
2026-05-18 Jakub Jelinek <jakub@redhat.com>
PR target/50481
* config/i386/i386-protos.h (ix86_expand_gfni_bitreverse): Declare.
* config/i386/i386-expand.cc (ix86_expand_gfni_bitreverse): New
function.
* config/i386/i386.md (bitreverse<mode>2): New expander.
* gcc.target/i386/gfni-builtin-bitreverse-1.c: New test.
Tomasz Kamiński [Wed, 13 May 2026 07:17:47 +0000 (09:17 +0200)]
libstdc++: Use on_month_day istream operator in ZoneInfo parsing. [PR124852]
This patch changes ZoneInfo parsing, to use operator>> for on_month_day
directly, and removes on_day tag. The operator>>(istream&, on_month_day)
is updated to not override on.month if the MONTH component is not present,
and set failbit instead. This allows to use in >> on >> time, to parse
MONTH DAY TIME.
We also handle failure to parse day number N for Www>=N or Www<=N
productions, by leaving the day part of input unchanged and setting
failbit.
PR libstdc++/124852
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (on_month_day::on_day_t, on_month_day::on_day):
Remove.
(operator>>(istream&, on_month_day::day_t&)): Inlined into...
(operator>>(istream&, on_month_day)): Inlined on_month_day::on_day.
Avoid modifying on.month if MONTH is not present. Report failure
on failure to parse day for LessEq / GreaterEq.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Andrew Pinski [Sat, 16 May 2026 23:17:01 +0000 (16:17 -0700)]
uncprop: small compile time optimization with switches
In the process of converting gswitch away from CASE_LABEL_EXPR,
I found a place in uncprop (like the case in dom) where we store
the whole CASE_LABEL_EXPR. This place only needed to store the value
of the case rather than the whole case expression. This does that
small optimization and adds a few comments for the next person
to understand what is going on here. It was not obvious at my
first read of the code what it was doing or what error_mark
was being used for.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-uncprop.cc (associate_equivalences_with_edges): For switches
info only store the case low value to be recorded as
the only value. Add comments.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Fri, 15 May 2026 21:40:52 +0000 (14:40 -0700)]
dom: small compile time optimization with switches
In the process of converting gswitch away from CASE_LABEL_EXPR,
I found a place in dom where we store the whole CASE_LABEL_EXPR.
This place only needed to store the value of the case rather than
the whole case expression. This does that small optimization
and adds a few comments for the next person to understand what
is going on here. It was not obvious at my first read of the code
what it was doing or what error_mark was being used for.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-dom.cc (record_edge_info): For switches
info only store the case low value to be recorded as
the only value. Add comments.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Thomas Koenig [Sat, 16 May 2026 14:37:57 +0000 (16:37 +0200)]
PR 122245: -fc-prototypes when procedure defined via INTERFACE
This simple patch emits correct prototypes when a procedure is
defined via an interface by simply checking the presence
of an interface and using its formal arglist.
gcc/fortran/ChangeLog:
PR fortran/122245
* dump-parse-tree.cc (write_formal_arglist): Take the formal
arglist from the symbol's interface if it is present.
Rainer Orth [Sat, 16 May 2026 12:17:31 +0000 (14:17 +0200)]
libgomp: Fix env.c compilation on Darwin
Darwin bootstrap is currently broken compiling libgomp/env.c:
libgomp/env.c: In function 'initialize_env':
libgomp/env.c:2476:7: error: use of logical '||' with constant operand '2097152' [-Werror=constant-logical-operand]
2476 | || GOMP_DEFAULT_STACKSIZE)
| ^~
libgomp/env.c:2476:7: note: use '|' for bitwise operation
2476 | || GOMP_DEFAULT_STACKSIZE)
| ^~
This is only seen on Darwin since this is the only target that defines a
non-zero GOMP_DEFAULT_STACKSIZE.
Bootstrapped without regressions on x86_64-apple-darwin25.5.0 and
x86_64-apple-darwin21.6.0.
Dragon Archer [Fri, 8 May 2026 18:16:15 +0000 (18:16 +0000)]
libstdc++: replace assert with __glibcxx_assert [PR125228]
Unlike `__glibcxx_assert` which is guarded by `_GLIBCXX_ASSERTIONS` and
enabled only in Debug build of libstdc++, `assert` is either always
enabled, or always disabled if manually defining `NDEBUG` before
`#include <cassert>` or `#include <assert.h>`. This not only makes
`assert` inflexible, but also introduces extra runtime overhead and/or
increased binary size in Release builds.
Uses of `assert` without `NDEBUG` introduces `__FILE__` into the final
library, and unconditionally checks the assertions.
This patch replaces the uses of `assert` in ryu and debug.cc with
`__glibcxx_assert`, and removed their direct dependency on `<cassert>`.
To avoid modifying the third-party ryu headers, this patch redefines
`assert` to `__glibcxx_assert` when including the ryu headers.
libstdc++-v3/ChangeLog:
PR libstdc++/125228
* src/c++11/debug.cc: Replace assert with __glibcxx_assert,
and remove the include of <cassert>.
* src/c++17/floating_to_chars.cc: Likewise, but redefine
assert as __glibcxx_assert.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Jonathan Wakely [Fri, 15 May 2026 09:41:33 +0000 (10:41 +0100)]
libstdc++: Make configure check for atomics work on Windows [PR125312]
The changes in r16-427-g86627faec10da5 do not work for native mingw-w64
builds because the #include with a hardcoded unix-style path doesn't
work for a native mingw-w64 compiler. This results in configure
detecting that native mingw builds do not support atomic builtins for
the _Atomic_word type, causing non-inline atomics to be used for
__gnu_cxx::__exchange_and_add and __gnu_cxx::__atomic_add, which is an
unintented ABI change (and inconsistent with mingw cross-compilers where
the configure test passes and so enables the inline functions using
atomic builtins).
This attempts to solve the problem by copying the atomic_word.h header
to the current working directory, so it can be included without using an
absolute path.
libstdc++-v3/ChangeLog:
PR libstdc++/125312
* acinclude.m4 (GLIBCXX_ENABLE_ATOMIC_BUILTINS): Copy header
into cwd instead of including it via an absolute path.
* configure: Regenerate.
Jakub Jelinek [Sat, 16 May 2026 08:50:57 +0000 (10:50 +0200)]
Add __builtin_bitreverse128 [PR50481]
We already have __builtin_bswap{16,32,64,128}, the last one has been
added ~6 years ago. So, I think we should have also
__builtin_bitreverse128.
The following patch does that.
Note, we don't have __builtin_bswapg and I don't think we should, one can
only byteswap something which has number of bits divisible by CHAR_BIT.
For __builtin_bitreverseg that isn't a problem, but am not sure I want to
spend time handling it on say unsigned _BitInt(357). Perhaps only if there
is some real-world use-case.
2026-05-16 Jakub Jelinek <jakub@redhat.com>
PR target/50481
* doc/extend.texi (__builtin_bitreverse32, __builtin_bitreverse64):
Tweak wording for consistency with __builtin_bswap*.
(__builtin_bitreverse128): Document.
* builtins.def (BUILT_IN_BITREVERSE128): New.
* builtins.cc (expand_builtin): Handle also BUILT_IN_BITREVERSE128.
(is_inexpensive_builtin): Likewise.
* fold-const-call.cc (fold_const_call_ss): Handle also
CFN_BUILT_IN_BITREVERSE128.
* fold-const.cc (tree_call_nonnegative_warnv_p): Likewise.
* tree-ssa-ccp.cc (evaluate_stmt): Handle also BUILT_IN_BITREVERSE128.
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Handle also
CFN_BUILT_IN_BITREVERSE128.
(cond_removal_in_builtin_zero_pattern): Likewise.
Jakub Jelinek [Sat, 16 May 2026 08:50:00 +0000 (10:50 +0200)]
tree-ssa-ccp: Fix up __builtin_bitreverse* handling [PR50481]
The committed __builtin_bitreverse* patch mishandled the
bitwise CCP handling, it is true that BUILT_IN_BITREVERSE* can be
handled there similarly to BUILT_IN_BSWAP*, but not exactly, for
the latter we need (and do) bswap the value and mask constants,
while for the former we obviously need to bitreverse them instead.
2026-05-16 Jakub Jelinek <jakub@redhat.com>
PR target/50481
* tree-ssa-ccp.cc (evaluate_stmt): Fix up
BUILT_IN_BITREVERSE{8,16,32,64} handling.
* gcc.dg/builtin-bitreverse-3.c: New test.
Reviewed-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Roger Sayle [Sat, 16 May 2026 07:44:06 +0000 (08:44 +0100)]
x86_64: Handle hard registers in TImode STV with inter-unit moves.
This patch extends the types of chains that can be converted by x86's
TImode Scalar-To-Vector (STV) pass, to include chains that originate
and/or terminate with moves from/to hard registers. Currently STV
candidate instructions explicitly exclude those than mention hard
registers.
As motivation, consider the four following functions:
__int128 a, b, c, z;
__int128 fun();
void foo_in(__int128 x) { z = (x ^ a ^ b ^ c); }
__int128 foo_out() { return (z ^ a ^ b ^ c); }
__int128 foo_inout(__int128 x) { return (x ^ a ^ b ^ c ^ z); }
void foo_fun() { z = (fun() ^ a ^ b ^ c); }
Of these, only the first, foo_in, is currently STV converted to use
SSE instructions. Its incoming argument is constructed from a concat
of two DImode registers, and support for this idiom was added in a
previous STV patch. The next two functions aren't converted because
the chain terminates with a return, which places the TImode result in
a hard register. Likewise, the final foo_fun case isn't converted as
the result from fun initiates a chain from a hard register.
This patch supports STV conversion of TImode register-to-register
moves, where either the source or the destination (but not both) is
a hard register, by implementing it as a (relatively expensive)
inter-unit move.
The one small subtlety in this patch is in the cost calculation
for inter-unit moves, which now correctly uses both sse_to_integer
and integer_to_sse costs. This patch models the transfer of double
word transfers between units as interunit_cost + COSTS_N_INSNS(1),
i.e. that the two transfers are pipelined in parallel, so that the
high latency is accounted for once [rather than 2*interunit_cost
that assumes the transfers take place strictly sequentially with
twice the single word transfer latency].
This revision implements Hongtao's suggestions/fixes to support
TImode values in non-general hard registers, and adds two more
test cases. Alas things turned out to be a little more complicated
than originally proposed; previously STV used PUT_MODE on TImode
pseudo registers to change their mode everywhere, but something
different is required for hard registers, which may be used in
multiple modes in a function.
To demonstrate the (additional) benefits, consider the function:
register __int128 x __asm("xmm0");
register __int128 y __asm("xmm1");
__int128 m;
2026-05-16 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain): If the chain
starts with a register-to-register move from a hard register,
then the hard register's defs don't need to converted.
(timode_scalar_chain::compute_convert_gain): Provide costs
for hard_reg-to-pseudo and pseudo-to-hard_reg moves.
Tweak speed cost of timode_concatdi_p moves.
(timode_scalar_chain::convert_insn): Add support for
hard_reg-to-pseudo and pseudo-to-hard_reg TImode transfers.
(timode_scalar_to_vector_candidate_p): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx-stv-1.c: New test case.
* gcc.target/i386/sse2-stv-3.c: Likewise.
* gcc.target/i386/sse2-stv-4.c: Likewise.
* gcc.target/i386/sse2-stv-5.c: Likewise.
Pan Li [Tue, 12 May 2026 14:28:06 +0000 (22:28 +0800)]
RISC-V: Add test cases for scalar unsigned SAT form 10
Form 10 is supported already, add test to make sure of it.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add form 10.
* gcc.target/riscv/sat/sat_u_mul-11-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-11-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-11-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-11-u8.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-11-u8.c: New test.
Pan Li [Tue, 12 May 2026 14:23:03 +0000 (22:23 +0800)]
RISC-V: Add test cases for scalar unsigned SAT form 9
Form 9 is supported already, add test to make sure of it.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add form 9.
* gcc.target/riscv/sat/sat_u_mul-10-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-10-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-10-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-10-u8.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-10-u8.c: New test.
Pan Li [Tue, 12 May 2026 13:28:32 +0000 (21:28 +0800)]
RISC-V: Add test cases for scalar unsigned SAT form 8
Form 8 is supported already, add test to make sure of it.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add form 8.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-9-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-9-u8-from-u64.c: New test.
Rainer Orth [Sat, 16 May 2026 07:07:00 +0000 (09:07 +0200)]
libgfortran: Fix libcaf_shmem build on Solaris
libcaf_shmem doesn't currently build on Solaris. Previously this went
unnoticed because the AX_PTHREADS autoconf macro erroneously didn't
detect pthreads support. Once this is fixed, compilation fails:
In file included from caf/shmem/supervisor.h:35,
from caf/shmem/alloc.c:31:
caf/shmem/sync.h:46:25: error: conflicting types for ‘lock_t’; have ‘caf_shmem_mutex’ {aka ‘struct _pthread_mutex’}
46 | typedef caf_shmem_mutex lock_t;
| ^~~~~~
In file included from /usr/include/sys/machtypes.h:12,
from /usr/include/sys/types.h:17,
from caf/shmem/thread_support.h:33,
from caf/shmem/shared_memory.h:28,
from caf/shmem/allocator.h:31,
from caf/shmem/alloc.h:28,
from caf/shmem/alloc.c:29:
/usr/include/ia32/sys/machtypes.h:50:25: note: previous declaration of ‘lock_t’ with type ‘lock_t’ {aka ‘unsigned int’}
The lock_t definition in <ia32/sys/machtypes.h> is benign: POSIX.1
reserves the _t suffix for the implementation. At the very least the
code should use a properly prefixed type instead, which this patch does
by changing the code to use caf_shmem_lock_t instead.
Bootstrapped without regressions on i386-pc-solaris2.11,
sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.
Rainer Orth [Sat, 16 May 2026 07:01:42 +0000 (09:01 +0200)]
libstdc++: Remove Solaris workaround in 20_util/to_chars/float128_c++23.cc [PR107815]
As described in PR libstdc++/107815, one subtest of
20_util/to_chars/float128_c++23.cc was disabled on Solaris due to a bug
in printf(3C). This has been fixed since October 2023, so the
workaround can be removed.
Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.
also for ((~x) | y) ^ (x & y) version
_1 = ~x_2(D);
t1_4 = _1 | y_3(D);
t2_5 = x_2(D) & y_3(D);
_6 = t1_4 ^ t2_5;
return _6;
to:
int _1;
_1 = ~x_2(D);
return _1;
Bootstrapped and tested on aarch64-linux-gnu with
RUNTESTFLAGS="tree-ssa.exp".
changes since v1:
* v3: Change sf2/sg2 to sf/sg in test case
* v2:
- Update testcase to exercise GIMPLE folding
- Add additional type coverage
- Add vector and _Bool coverage
- Move code above in the file
PR tree-optimization/112095
gcc/ChangeLog:
* match.pd: Simplify ((~x) & y) ^ (x | y)
to x and ((~x) | y) ^ (x & y) to ~x.
Currently without this --with-threads=pthread fails.
gcc/ChangeLog:
* config/i386/mingw-pthread.h:
rename to generic config/mingw/mingw-pthread.h
* config.gcc [aarch64-*-mingw*]:
Fix support for posix threading on aarch64 mingw targets.
Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
Karl Meakin [Thu, 26 Mar 2026 16:50:36 +0000 (16:50 +0000)]
Configure EditorConfig for Git commit messages
Add a section for Git commit messages to the `.editorconfig` file, so
that editors with EditorConfig support will automatically format commit
messages according to the GNU style.
Andrew Pinski [Fri, 15 May 2026 08:12:15 +0000 (01:12 -0700)]
ssa_operands: speed up GIMPLE_SWITCH handling
The operands of a GIMPLE_SWITCH is index, followed by the cases.
The cases are all CASE_LABEL_EXPR which are skipped via operands_scanner::get_expr_operands
anyways so we only need to scan the index operand.
This also the first step in changing GIMPLE_SWITCH slightly.
gcc/ChangeLog:
* tree-ssa-operands.cc (operands_scanner::parse_ssa_operands):
Process index of the gswitch only.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Thu, 14 May 2026 04:24:43 +0000 (21:24 -0700)]
tree-cfg: Revert part of r8-546 [PR125290]
This reverts the group_case_labels_stmt part of r8-546-gca4d2851687875.
This is placed in the wrong location to remove the case statements that go
directly to __builtin_unreachable. In fact the removal of the case statements
make us lose optimizations in some cases (Wuninitialized-pr107919-1.C for one).
Also this fixes PR 125290 by no longer leaving around a switch which just
has a default case.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/125290
gcc/ChangeLog:
* tree-cfg.cc (group_case_labels_stmt): Remove code that was
added to remove `cases` that goto blocks of unreachable.
* tree-ssa-forwprop.cc (optimize_unreachable): Remove the
comment about switch cases being handled.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wuninitialized-pr107919-1.C: Remove xfail.
* gcc.dg/analyzer/taint-assert.c: Update for the non-removal
of block containing unreachable.
* gcc.dg/torture/pr125290-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Tomasz Kamiński [Thu, 14 May 2026 12:47:03 +0000 (14:47 +0200)]
libstdc++: Include range_access.h from <optional> and <stacktrace>
This implements resolutions for LWG4131 and LWG3625 (also part of 4.8. section
of the P3016R6). As for any other issues LWG issue changes are applied as DR
for the oldest applicable standards:
* <optional> (LWG4131): C++26, since optional range support
* <stacktrace> (LWG3625): C++23, since introduction
libstdc++-v3/ChangeLog:
* include/std/optional [__cpp_lib_optional_range_support]:
Replace <bits/stl_iterator.h> include with <bits/range_access.h>.
* include/std/stacktrace: Likewise.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Andi Kleen [Thu, 14 May 2026 21:37:01 +0000 (14:37 -0700)]
pr124532: Reset musttail attribute in compound statements
A compound statement didn't reset the musttail state after the statement
with the attribute, which led to bogus errors later. Always reset it.
PR c/124532
gcc/c/ChangeLog:
* c-parser.cc (struct attr_state): Add reset method.
(c_parser_compound_statement_nostart): Rename a to astate.
Reset state before iterating statements.
Lili Cui [Fri, 15 May 2026 13:33:25 +0000 (21:33 +0800)]
testsuite: Add aarch64 SVE support to slp-reduc-15.c
Add aarch64 SVE support and use -mavx2 for x86 to support all x86
modes.
Changes:
- Add aarch64-*-* target with -march=armv8.2-a+sve
- Use -mavx2 instead of -march=x86-64-v3 to support all x86 modes
- Separate -fgimple from architecture-specific options.
Tomasz Kamiński [Fri, 15 May 2026 09:53:51 +0000 (11:53 +0200)]
libstdc++: Use IANA name for ISO-8859-1 in format tests.
Use IANA name for ISO-8859-1 as "-fexec-charset=" and add dg-require-iconv,
to make sure it is supported.
libstdc++-v3/ChangeLog:
* testsuite/std/format/debug_nonunicode.cc: Pass ISO-8859-1 as
exec-charset and make sure that it is supported.
* testsuite/std/format/fill_nonunicode.cc: Likewise.
The peeled converted IV handling added in r16-3562 incorrectly analyzes
it as [6, 6 + 254, 6 + 254 * 2] instead of [6, 4, 2]. Then VRP uses the
intersect of {6, 560, 514} and {2, 4, 6}, i.e. {6} as the possible value
range, and propagates the constant 6 for _70.
Extend the step (for example, 254 => -2) to fix the issue.
PR tree-optimization/125291
gcc/
* tree-scalar-evolution.cc (simplify_peeled_chrec): Sign-extend
the step for peeled converted IV.
Abhishek Kaushik [Thu, 14 May 2026 11:10:35 +0000 (11:10 +0000)]
match.pd: Allow FNMA fold through conversions
The FMA folds in match.pd currently only matches (negate @0) directly.
When the negated operand is wrapped in a type conversion
(e.g. (convert (negate @0))), the simplification to IFN_FNMA does not
trigger.
This prevents folding of patterns such as:
*c = *c - (v8u)(*a * *b);
when the multiply operands undergo vector type conversions before being
passed to FMA. In such cases the expression lowers to neg + mla/mad
instead of the more optimal msb/mls on AArch64 SVE, because the current
fold cannot see through the casts.
Extend the match pattern to allow conversions on the negated
operand and the second multiplicand:
The match is restricted to nop_convert on the negated operand to avoid
folding through value-changing conversions. This enables recognition of
the subtraction-of-product form even when vector element type casts are
present.
The fold is only performed when signed overflow is unobservable for
both the outer FMA operation and the inner negation. This avoids
changing sanitized overflow behaviour when looking through the nop
conversion on the negated operand.
With this change, AArch64 SVE code generation is able to select msb/mls
instead of emitting a separate neg followed by mla/mad.
This patch was bootstrapped and regression tested on aarch64-linux-gnu.
PR target/123924
gcc/
* match.pd: Allow conversions in FMA-to-FNMA fold.
gcc/testsuite/
* gcc.target/aarch64/sve/fnma_match.c: New test.
* gcc.target/aarch64/sve/pr123897.c:
Update the test to scan for FNMA in the tree dump.
Richard Biener [Wed, 6 May 2026 08:39:18 +0000 (10:39 +0200)]
Add vector_costs::add_slp_cost grouping hook
The following simplifies the earlier RFC for making it easier
for the target to correlate multiple cost events created from
a single SLP operation. Instead of changing where we record
costs this patch only adjusts the submission part.
Targets wanting to take advantage of this can implement
add_slp_cost and handle all or select cases and resort
to the default implementation to get add_stmt_cost events
for unhandled groups.
* tree-vectorizer.h (vector_costs::add_slp_cost): New.
* tree-vectorizer.cc (vector_costs::add_slp_cost): New
default version.
* tree-vect-slp.cc (add_slp_costs): Helper for dispatching
a cost vector in SLP chunks.
(vect_slp_analyze_operations): Adjust.
(li_cost_vec_cmp): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
Jakub Jelinek [Fri, 15 May 2026 07:55:40 +0000 (09:55 +0200)]
i386: Fix up *minmax<mode>3_4 [PR125308]
IEEE min/max are not commutative and in the pattern
(define_insn "ieee_<ieee_maxmin><mode>3<mask_name><round_saeonly_name>"
[(set (match_operand:VFH 0 "register_operand" "=x,v")
(unspec:VFH
[(match_operand:VFH 1 "register_operand" "0,v")
(match_operand:VFH 2 "<round_saeonly_nimm_predicate>" "xBm,<round_saeonly_constraint>")]
IEEE_MAXMIN))]
the first operand is a register and only the second one is register/memory.
Now, the *minmax<mode>3_3 define_insn_and_split does
rtx tmp = force_reg (<MODE>mode, operands[3]);
rtvec v = gen_rtvec (2, tmp, operands[2]);
operands[5] = gen_rtx_UNSPEC (<MODE>mode, v, u);
where operands[3] is the const0_operand, so operands[2] can there be
a memory, but in the *minmax<mode>3_4 case
rtx tmp = force_reg (<MODE>mode, operands[3]);
rtvec v = gen_rtvec (2, operands[2], tmp);
operands[5] = gen_rtx_UNSPEC (<MODE>mode, v, u);
operands[2] goes into the operand which must be a REG, so it
is incorrect to split it into something that won't work.
Now, I've tried both disabling the define_insn_and_split and
the following patch, the former to the latter results in
movaps a, %xmm0
pxor %xmm1, %xmm1
- cmpltps %xmm0, %xmm1
- andps %xmm1, %xmm0
+ maxps %xmm1, %xmm0
movaps %xmm0, a
ret
on the testcase, so I think it is better to match it and force_reg
(it is a pre-reload splitter) than change "nonimmediate_operand"
to "register_operand" because it won't match in that case.
2026-05-15 Jakub Jelinek <jakub@redhat.com>
PR target/125308
* config/i386/sse.md (*minmax<mode>3_4): Force also
operands[2] into a REG.
Future work could optimize this on specific targets:
- ARM: lower to RBIT
- x86 with GFNI: lower to vgf2p8affineqb
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-bit-reversal/
2026-05-15 Disservin <disservin.social@gmail.com>
Jakub Jelinek <jakub@redhat.com>
Jakub Jelinek [Fri, 15 May 2026 06:54:42 +0000 (08:54 +0200)]
testsuite: Add testcase for consteval-only type [PR125179]
The following testcase tests that the consteval-only computation is not
quadratic. With a loop of 50000 types, I believe this would be O(50000^2)
in the earlier implementation and so would timeout.
2026-05-15 Jakub Jelinek <jakub@redhat.com>
PR c++/125179
* g++.dg/reflect/pr125179.C: New test.
Given the recent (data->flags && ff_genericize) vs.
(data->flags & ff_genericize) typo, I've looked at warning in similar
cases.
We don't warn for cases like that at all, clang/clang++ has
-Wconstant-logical-operand warning enabled by default.
Their behavior is:
1) only warns for rhs of &&/|| (why?)
2) don't warn if rhs is bool
3) for C++ warn if rhs is constant or folds into constant,
for C warn if rhs is constant or folds into constant and
that constant is not 0 or 1
4) I think it doesn't warn if rhs comes from a macro
The following patch implements similar warning with similar wording,
just provides the value of the constant, but
1) warns for lhs and rhs
2) doesn't warn if either lhs or rhs is bool
3) doesn't warn if lhs or rhs is or folds to constant 0 or 1
(but does warn if it is constant 1 of enum type in an enum which
has enumerator other than just 0/1 (i.e. poor man's boolean))
4) doesn't care if it comes from a macro or not
I think 64 && x is similarly suspicious to x && 64 and both
are likely to be meant 64 & x or x & 64. I think having
&& 1 or && 0 is common even in C++, people don't always write
&& true or && false etc. and don't see why C++ would be different
in that from C, I think people sometimes write
if (1
#ifdef ABC
&& ABC
#endif
#ifdef DEF
&& DEF
#endif
&& 1)
and similar (or similarly with 0/true/false or ||). And the warning
is only enabled in -Wall, not by default.
2026-05-15 Jakub Jelinek <jakub@redhat.com>
PR c++/125081
gcc/
* doc/invoke.texi (Wconstant-logical-operand): Document.
gcc/c-family/
* c.opt (Wconstant-logical-operand): New option.
* c.opt.urls: Regenerate.
gcc/c/
* c-tree.h (parser_build_binary_op): Add ORIG_ARG1 argument.
* c-typeck.cc (parser_build_binary_op): Likewise. Emit
-Wconstant-logical-operand warnings.
* c-parser.cc (c_parser_binary_expression): Adjust
parser_build_binary_op caller, pass to it the original
stack[sp - 1].expr.value before c_objc_common_truthvalue_conversion.
gcc/cp/
* typeck.cc (cp_build_binary_op): Emit -Wconstant-logical-operand
warnings.
gcc/testsuite/
* c-c++-common/Wconstant-logical-operand-1.c: New test.
* c-c++-common/Wconstant-logical-operand-2.c: New test.
Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: "Joseph S. Myers" <josmyers@redhat.com>