Pan Li [Tue, 30 Jun 2026 06:25:32 +0000 (14:25 +0800)]
RISC-V: Add testcase for unsigned scalar SAT_MUL form 19
The form 19 of unsigned scalar SAT_MUL has supported from
the previous change. Thus, add the test cases to make sure
it works well
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-20-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-20-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-20-u8-from-u64.c: New test.
Pan Li [Tue, 30 Jun 2026 06:21:58 +0000 (14:21 +0800)]
RISC-V: Add testcase for unsigned scalar SAT_MUL form 18
The form 18 of unsigned scalar SAT_MUL has supported from
the previous change. Thus, add the test cases to make sure
it works well.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-19-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-19-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-19-u8-from-u64.c: New test.
Pan Li [Tue, 30 Jun 2026 06:18:31 +0000 (14:18 +0800)]
RISC-V: Add testcase for unsigned scalar SAT_MUL form 17
The form 17 of unsigned scalar SAT_MUL has supported from
the previous change. Thus, add the test cases to make sure
it works well
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-18-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-18-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-18-u8-from-u64.c: New test.
Pan Li [Tue, 30 Jun 2026 06:01:18 +0000 (14:01 +0800)]
RISC-V: Add testcase for unsigned scalar SAT_MUL form 16
The form 16 of unsigned scalar SAT_MUL has supported from
the previous change. Thus, add the test cases to make sure
it works well.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-17-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u16-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u16-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u32-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u32-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u8-from-u64.rv32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-17-u8-from-u64.rv64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u16-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u16-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u32-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u64-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u8-from-u128.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u8-from-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u8-from-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-17-u8-from-u64.c: New test.
Andrew Pinski [Sat, 4 Jul 2026 02:59:19 +0000 (19:59 -0700)]
testsuite: Fix pr94589-5a.c
I did test this testcase but I must have missed it failing somehow.
Anyways the problem is the scan-tree-dump is missing checking for u< and u>
which can show up in some cases.
Pushed as obvious after a quick test to make sure there is no failure any more.
gcc/testsuite/ChangeLog:
* gcc.dg/pr94589-5a.c: Allow for `u<` and `u>` in the scan too.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Philipp Tomsich [Fri, 8 May 2026 17:24:58 +0000 (19:24 +0200)]
tree-optimization/122569 - recognize CLZ via isolated MSB DeBruijn lookup
Recognize a CLZ idiom where the OR-cascade is followed by
(value - (value >> 1)) to isolate the MSB as a power of two (2^k), then
a DeBruijn multiply-and-shift maps 2^k back to k:
value |= value >> 1;
...
value |= value >> 32;
result = table[((value - (value >> 1)) * MAGIC) >> 58];
After the cascade value is 2^(k+1) - 1, so (value - (value >> 1)) is 2^k
and the multiply-and-shift is a CTZ-style DeBruijn lookup whose table
satisfies table[(magic << k) >> shift] == k.
Add match.pd pattern clz_msb_iso_table_index on top of the
msb_or_cascade_64 helper, so it only spells out the (s - (s >> 1))
isolation and the DeBruijn shape. simplify_count_zeroes validates the
table with the existing CTZ checkfn (the direct-form check) but emits
IFN_CLZ; both forms store MSB positions, so the CLZ path including
zero_val pre-compensation is unchanged.
Relax the element-type check from "precision <= 32" to "integral and
precision <= 64" so tables declared as unsigned long (64-bit on LP64)
are accepted; the values are bit positions and fit any integer type.
Only a 64-bit variant is added; all known uses (Stockfish, zstd,
cpython, the PR122569 comment 3 reproducer) are 64-bit.
gcc/ChangeLog:
PR tree-optimization/122569
* match.pd (clz_msb_iso_table_index): New match pattern.
* tree-ssa-forwprop.cc (gimple_clz_msb_iso_table_index): Declare.
(simplify_count_zeroes): Recognize the new pattern; route its
table validation through the CTZ checkfn. Relax the element
type check to accept integer types up to 64 bits.
gcc/testsuite/ChangeLog:
PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-3.c: New test.
Philipp Tomsich [Fri, 8 May 2026 17:22:04 +0000 (19:22 +0200)]
match.pd: factor MSB OR-cascade out of clz_table_index
Factor the 5- and 6-stage MSB OR-cascade -- which sets every bit from 0
to the input's MSB -- into two helper match patterns msb_or_cascade_32
and msb_or_cascade_64, and rewrite the 32-bit and 64-bit clz_table_index
patterns as one-liners over them. The helpers retain the
integral/unsigned/precision and shift-constant checks, so this is a
no-functional-change refactor.
It prepares a follow-up that recognises a CLZ idiom isolating the MSB
before the DeBruijn multiply, which reuses msb_or_cascade_64 directly.
gcc/ChangeLog:
* match.pd (msb_or_cascade_32, msb_or_cascade_64): New match
helpers.
(clz_table_index): Rewrite the 32-bit and 64-bit forms to use
the cascade helpers.
Andrew Pinski [Tue, 30 Jun 2026 19:13:05 +0000 (12:13 -0700)]
match: Simplify `(a CMP1 b) AND/IOR (a CMP2 b)` [PR126042]
This finishes up simplifications of most comparisons
outside of reassociation. Including but not limited to
many floating point comparisons.
Instead of redoing what is done in fold-cost.cc's combine_comparisons,
this reuses combine_comparisons to find the new CMP.
In the case of `-fno-trapping-math`, this allows to optimize `<=>`
which it was not before.
Changes since v1:
* v2: Fix some typos. Add a C testcase.
* fold-const.cc (combine_comparisons): Split into
2 versions. Also handle BIT_AND_EXPR and BIT_IOR_EXPR.
* fold-const.h (combine_comparisons): New declaration.
* match.pd (`(a CMP1 b) BITOP (a CMP2 b)`): New pattern.
gcc/testsuite/ChangeLog:
* g++.dg/opt/pr94589-5a.C: New test.
* gcc.dg/pr94589-5.c: Explictly enable -ftrapping-math.
* gcc.dg/pr94589-5a.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jakub Jelinek [Fri, 3 Jul 2026 18:45:07 +0000 (20:45 +0200)]
c++: Fix structured binding mangling during error recovery [PR126057]
The following testcase ICEs during error recovery. We try to
mangle a structured binding base variable, but because it has
been erroneous, the mangling ICEs as it can't find the corresponding
structured bindings.
Now, we already have a hack in cp_finish_decomp when things are erroneous,
we set assembler name to <decomp> so that mangling isn't done.
But we do that only for DECL_NAMESPACE_SCOPE_P bases and
block scope static structured bindings can be mangled too,
and during instantiation, if tsubst_decomp_names fails, we don't
call cp_finish_decomp at all, so in that case we need to also
avoid the mangling of the structured binding base.
2026-07-03 Jakub Jelinek <jakub@redhat.com>
PR c++/126057
* decl.cc (cp_finish_decomp): Set assembler name to
<decomp> during error recovery whenever TREE_STATIC
rather than just DECL_NAMESPACE_SCOPE_P.
* pt.cc (tsubst_stmt): If tsubst_decomp_names fails,
set assembler name to <decomp>.
The following patch attempts to implement the C++29 P2953R5
Adding restrictions to defaulted assignment operator functions
paper.
The paper seems to be misnamed to me because it changes the validity
of defaulted constructors as well in some cases.
What the patch does is that it calls maybe_delete_defaulted_fn
also for the FUNCTION_RVALUE_QUALIFIED case for C++29, and then in
maybe_delete_defaulted_fn for C++29 errors rather than making the
function deleted in most cases, with the exception of
[dcl.fct.def.default]/(2.5) case which ought to be still deleted
rather than ill-formed (but only if that is the sole change that
is not on the whitelist of possible differences).
I've tried to include all the tests I found in the paper (referenced or
directly in it) with the exception of the https://gcc.gnu.org/PR86646
case, added some further ones and tweaked anything in the existing
test that behaves differently for -std=c++29 with the patch.
2026-07-03 Jakub Jelinek <jakub@redhat.com>
PR c++/125826
* method.cc: Implement C++29 P2953R5 - Adding restrictions to
defaulted assignment operator functions.
(maybe_delete_defaulted_fn): For C++29, error instead of
deleting always, with the exception of F1 having parmtype
const C & and F2 having implicit_parmtype C & and no other
non-permitted changes. Move checks whether defaulted fn
should be deleted or ill-formed at all from defaulted_late_check
to this function. Also error for C++29 if
FUNCTION_RVALUE_QUALIFIED.
(defaulted_late_check): Call maybe_delete_defaulted_fn
unconditionally.
* g++.dg/cpp0x/defaulted51.C: Adjust expected diagnostics
for C++29.
* g++.dg/cpp0x/defaulted55.C: Likewise.
* g++.dg/cpp0x/defaulted56.C: Likewise.
* g++.dg/cpp0x/defaulted57.C: Likewise.
* g++.dg/cpp0x/defaulted63.C: Likewise.
* g++.dg/cpp0x/defaulted64.C: Likewise.
* g++.dg/cpp0x/defaulted65.C: Likewise.
* g++.dg/cpp0x/defaulted66.C: Likewise.
* g++.dg/cpp0x/defaulted67.C: Likewise.
* g++.dg/cpp0x/defaulted68.C: Likewise.
* g++.dg/cpp1y/defaulted2.C: Likewise.
* g++.dg/cpp29/defaulted1.C: New test.
* g++.dg/cpp29/defaulted2.C: New test.
* g++.dg/cpp29/defaulted3.C: New test.
* g++.dg/cpp29/defaulted4.C: New test.
* g++.dg/cpp29/defaulted5.C: New test.
* g++.dg/cpp29/defaulted6.C: New test.
Jakub Jelinek [Fri, 3 Jul 2026 18:41:45 +0000 (20:41 +0200)]
c++: Fix up ICEs with some metafns with non-dependent args in templates [PR126036]
The following testcase ICEs, because potential_constant_expression_1 handles
some forms of CAST_EXPR, but cxx_eval_constant_expression doesn't.
Normally, if we have e.g. a non-dependent compound literal in a template,
finish_compound_literal will create a CAST_EXPR, but
fold_non_dependent_expr_template -> instantiate_non_dependent_expr_internal
will fold that away, so constexpr.cc evaluation doesn't see it.
reflect.cc calls finish_compound_literal in 3 spots, one is to create
std::array <type, 0> {}, another one is when creating the
std::meta::exception object to throw and the last one when creating
std::vector <std::meta::info> object to return from various metafns.
If that is done with processing_template_decl, finish_compound_literal
will again return a CAST_EXPR, but unlike the usual case it is during
constant evaluation and so instantiate_non_dependent_expr_internal
will not be invoked on it to clean that up.
Now, in the get_meta_exception_object case, we already have
/* Don't throw in a template. */
if (processing_template_decl)
{
*non_constant_p = true;
return NULL_TREE;
}
This patch uses the same thing (i.e. avoid folding even non-dependent
metafn calls that would need finish_compound_literal during
processing_template_decl) to fix this.
I think in both cases it is fine to defer the constant evaluation,
the return type from those metafns is not dependent (std::meta::info
in the reflect_constant_array case, std::vector <std::meta::info>
otherwise).
2026-07-03 Jakub Jelinek <jakub@redhat.com>
PR c++/126036
* reflect.cc (get_range_elts): Avoid calling finish_compound_literal
when processing_template_decl, instead set *non_constant_p and
return NULL_TREE.
(process_metafunction): Likewise.
AVR: Adding +/-1 to a lower reg doesn't need a scratch.
Adding +/-1 to a lower register can be performed by sequences like
sec
adc r14, __zero_reg__
adc r15, __zero_reg__
resp.
sec
sbc r14, __zero_reg__
sbc r15, __zero_reg__
that don't need a scratch reg. The code size is unchanged but
the register pressure goes down.
gcc/
* config/avr/avr.cc (avr_out_plus_1): Handle +/-1 on the
lower regs without needing a scratch.
* config/avr/avr.md (add<mode>3_clobber, *add<mode>3_clobber)
(add<mode>3, *add<mode>3, addpsi3, *addpsi3): Add constraint
alternative "Y01 Ym1" for +/-1 without scratch.
libstdc++: fix allocate_at_least test for small alignments [PR126072]
A test for P0401 allocate_at_least fails on target cris-elf,
which has a default allocator with alignment 4. This patch
adjusts tests to accommodate alignments down to 1, and removes
assumptions about short int.
Tested on x86 -m64 and -m32. Need assistance for cris-elf.
OpenMP, Fortran: Fix indentation in resolve_omp_clauses_aff_dep_map_cache
Commit basepoints/gcc-17-1747-ga7724fcb5f4 moved a large block of code
from resolve_omp_clauses to a new function. I noticed that the
indentation of some nested switch statements looked odd when trying to
rebase some patches on top of this, and Tobias independently had
addressed this as part of his own WIP followup patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2026-June/721741.html. This
patch addresses just the indentation problem and doesn't include any
functional changes.
WHR [Sat, 31 Aug 2024 06:26:02 +0000 (06:26 +0000)]
libssp: Include 'stdlib.h' for using alloca(3) [PR116547]
As GCC now treating implicit declaration of function as an error instead of
warning, compilation of libssp has been broken on some operating systems.
The following error is from an x86_64-unknown-freebsd11 system:
[...]/libssp/ssp.c: In function 'fail':
[...]/libssp/ssp.c:134:17: error: implicit declaration of function 'alloca' [-Wimplicit-function-declaration]
134 | p = buf = alloca (len);
| ^~~~~~
Similarly: amd64-unknown-openbsd7.9.
Most operating systems specifies that 'stdlib.h' should be included to get
the declaration of alloca(3).
PR other/116547
* ssp.c: Include stdlib.h for alloca(3).
Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>
Tomasz Kamiński [Wed, 24 Jun 2026 15:16:45 +0000 (17:16 +0200)]
libstdc++: Reserve _Pres_type value for prefixed hexadecimal floating presentation
The LWG4515, "format: a and A should insert the 0x or 0X prefix",
points that format currently does not provide ability to emit
prefixed hexadecimal presentation for floating-point. While changing
the output for a/A was not approved during Brno meeting, producing
a printf equivalent output is desired functionality.
This patch pre-emptively introduces and handles additional _Pres_type
values for the purpose of expressing this implementation. The values
are not currently user-facing (there is no corresponding format
specifier), but adding them now will avoid problems caused by linking
TU from older versions and allow the change to be handled as DR
(if necessary).
Currently _Pres_p/_Pres_P are used as placeholders for the values
(matching their behavior for pointers), however they can renamed
in future (only value is relevant).
Note, that for __formatter_int, the behavior of P/p can be already
expressed by setting _Pres_X/_Pres_x and _M_alt. In consequence the
existing uses of this name (as aliases to X/x) in __formatter_ptr
were adjusted accordingly.
libstdc++-v3/ChangeLog:
* include/std/format (_Pres_type::_Pres_p, _Pres_type::_Pres_P):
Change the values to which they are defined.
(__formatter_fp::format): Append 0x/0X if _M_type is _Pres_p/_Pres_P
respectively.
(__formatter_fp::_M_localize): Add __offset parameter representing
start of number value (after sign and prefix).
(__formatter_ptr::parse, __formatter_ptr::_M_default)
(__formatter_ptr::__formatter_ptr): Remove unused __type parameter,
and replace use of _Pres_p/P with _Pres_x/X.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jeff Law [Fri, 3 Jul 2026 13:50:13 +0000 (07:50 -0600)]
[RISC-V] Utilize shNadd.uw during ADD synthesis
So this is the next area to improve in the ALU synthesis code. In this case
we're looking to improve ADD synthesis. In particular we're missing the
ability to utilize the shNadd.uw instructions. In RTL these look like:
In particular note the masking. Essentially we're zero extending the shifted
operand from SI to DI before shifting. That's the only difference from the
more common shNadd insns. Since we already support using shNadd for ADD
synthesis the number of opportunities to exploit the .uw variant is relatively
limited. Given x + C the relevant values of C have bit 31+N set (where N is 1,
2 or 3) and bits 32+N clear. Additionally the low 12+N bits must be clear.
Otherwise there are other ways to synthesize the addition. But for those
limited constants we can use lui+shNadd.uw sequence. Concretely consider:
unsigned long foo(unsigned long src) { return src + 0x1ffffe000; }
That currently generates:
li a5,-4096
slli.uw a5,a5,1
add a0,a0,a5
Instead we want to generate:
li a5,-4096
sh1add.uw a0,a5,a0
You could legitimately ask why this isn't a simple combine pattern to squash
the slliw and add together. Enter our friend mvconst_internal. Its existence
encourages GCC to use the constant 0x1ffffe000 as-is in the RTL. So we end up
with the original x+C case again and mvconst_internal expands the constant back
into li+slli.uw after combine.
And that's also the reason why this has no testcase. We get the code we want
during expansion, combine+mvconst_internal do their thing and undo our
carefully crafted RTL, at least in the small isolated tests I've looked at. So
at this time the patch is largely a NOP, but it's a step on the path to
mvconst_internal as it's another set of complex constants we can avoid
synthesizing at least some of the time. It's also possible the early code
generation survives in larger contexts, I haven't really looked at that.
Given the difficulty in testing, I've mostly relied on looking at the expansion
code manually. Not great. But just in case this has been tested on
riscv32-elf & riscv64-elf without regressions. It's also bootstrapped and
regression tested on the c920 and k3. Waiting on pre-commit CI before moving
forward.
gcc/
* config/riscv/riscv.cc (synthesize_add): Utilize shNadd.uw when
appropriate.
Jeff Law [Fri, 3 Jul 2026 13:47:09 +0000 (07:47 -0600)]
[RISC-V] Improve AND synthesis
So a couple changes to AND synthesis.
First much like the ADD, IOR/XOR cases we should be using riscv_integer_cost
rather than riscv_const_insns. This results in improvement for the exact same
constants in the IOR/XOR case. This also introduces the ability to use rotation
to help AND synthesis.
Let's consider x & 0xff000000000000ff. It currently generates this:
li a5,-1
slli a5,a5,56
addi a5,a5,255
and a0,a0,a5
The trick here is to realize we only have 16 bits that are relevant and the can
be clustered together. So we rotate x right by 56 bit positions, then turn off
the high bits using zext.h then rotate 8 more positions to put the bits back
into the proper position. That looks like this:
rori a0,a0,56
zext.h a0,a0
rori a0,a0,8
We can also use zext.w and andi for the bit clearing step.
Built and tested on riscv32-elf and riscv64-elf and bootstrapped + regression
tested on the c920 and k3. Waiting on pre-commit CI before moving forward.
gcc/
* config/riscv/riscv.cc (and_synthesis): Use riscv_integer_cost rather
than riscv_const_insns. Use rotate+clear_bits+rotate when useful.
gcc/testsuite/
* gcc.target/riscv/and-synthesis-3.c: New test.
Tomasz Kamiński [Wed, 24 Jun 2026 09:40:08 +0000 (11:40 +0200)]
libstdc++: Provide defined behavior for unrecognized _Pres_type values.
If the new _Pres_type values are introduced for given type, they may
lead to unrecognized _Pres_type values, if the TU using them is
linked with TU compiled with older releases, and format from old
TU is selected.
For most of the formatters, the default implementation is used as the
fallback, however __formatter_int and __formatter_fp were treating
that as UB, due to call to __builtin_unreachable in default branch
the switch. This patch addresses above by fallbacking to _Pres_none
behavior in such case.
Note that this for C++20 affects programs using non-Unicode literal
encoding, as __do_vformat_to is exported from the library otherwise,
and thus newest version is always picked.
libstdc++-v3/ChangeLog:
* include/std/format (__formatter_int::format)
(__formatter_fp::format): For unrecognised _M_spec._M_type
values (default branch of switch) fallthrou to _Pres_none.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Thu, 2 Jul 2026 10:20:45 +0000 (12:20 +0200)]
libstdc++: Return optional<const _Ex&> from exception_ptr_cast.
This implements the remaining part of P3981R2 Better return types
in std::inplace_vector and std::exception_ptr_cast.
For the following functions that are defined out of line in
<optional> header, that header must be included before their
use:
* value - throw bad_optional_access, and would cause cyclic include
of <exception>,
* transform, and_then - couldn't be used with function that
returns value (wrapped) in optional, as they would instantiate
primary specialization
* or_else - reintroduce dependency of bits/invoke.h
* begin/end - dependency or normal_iterator from bits/stl_iterator.h.
The value_or was also defined out of line for consistency, I think
it would be confusing if calling value/transform requires <optional>
include, but not value_or.
This patch also introduce _S_from_ptr function to optional<T&>, that
allows it to be constructed from pointer, without need to check for
null. This function is public, as I believe it will be useful in more
places.
libstdc++-v3/ChangeLog:
* include/bits/optional_ref.h (optional<_Tp&>::_S_from_ptr):
Define.
* include/bits/version.def (exception_ptr_cast): Bump value
to 202603.
* include/bits/version.h: Regenerate.
* libsupc++/exception_ptr.h (exception_ptr_cast)
[__cpp_lib_exception_ptr_cast >= 202603L]: Change return
type to optional<const _Ex&>.
* testsuite/18_support/exception_ptr/exception_ptr_cast.cc:
Modify to handle change in the return type, and add test
for type convertible to optional to reference to that value.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Wed, 1 Jul 2026 08:47:59 +0000 (10:47 +0200)]
libstdc++: Extract optional<T&> specialization to bits/optional_ref.h.
This patch extracts optional<_Tp&> specialization to the new
bits/optional_ref.h file. This allows it to be used as return
type of exception_ptr_cast (as required by P3981R2) from <exception>,
without introducing cyclic dependency.
In addition to reference specialization, the definitions of
nullopt_t, nullopt, __is_valid_contained_type..., and _Optional_func
required for it are moved to this new file. We also forward declare
__gnu_cxx::__normal_iterator required for iterator type defintion.
To minimize set of dependencies, the following methods remain
defined (now out of line) in <optional> header:
* value - requires __throw_bad_optional_access
* begin/end - requires __normal_iterator
* then, or_else, transform, constructor from _Optional_func
- depends on invoke
* value_or - moved for consistency
Furthemore, to avoid introduction of dependency of <concepts>,
the requires clause for or_else is changed from invocable<_Fn>
to is_invocable_v<_Fn>. The is conforming, as standard specifies
the former in Constraints, and user cannot really on subsumption.
Finally, the in_place_t (and other tags) are extracted into separate
bits/inplace_tags.h header, removing the dependency on bits/utility.h.
libstdc++-v3/ChangeLog:
* include/Makefile.am (bits/optional_ref.h): Add.
* include/Makefile.in: Regenerate.
* include/bits/inplace_tags.h: New file.
* include/bits/utility.h (std::in_place_t, std::in_place)
(std::in_place_type_t, std::in_place_type)
(std::in_place_index_t, in_place_index): Move to
bits/inplace_tags.h.
* include/bits/optional_ref.h: New file.
* include/std/optional (std::nullopt_t, std::nullopt)
(std::__is_valid_contained_type_for_optional)
(std::_Optional_func, std::optional<_Tp&>)
(std::__is_optional_ref_v, std::__optional_ref_base):
Moved to bits/optional_ref.h.
(optional<_Tp&>::begin, optional<_Tp&>::end)
(optional<_Tp&>::value, optional<_Tp&>::value_or)
(optional<_Tp&>::and_then, optional<_Tp&>::transform)
(optional<_Tp&>::optional(_Optional_func<_Fn>, _Value)):
Define out of line.
(optional<_Tp&>::or_else): Define out of line, and
change requires from invocable<_Fn> to is_invocable_v<_Fn>.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Iain Sandoe [Thu, 25 Jun 2026 06:21:35 +0000 (07:21 +0100)]
objective-c,c++: Make declarations language-aware [PR124260].
This is preparation for the main part of the fix of the PR, in this we
make sure to build decls with language-specific data when the compiler
is Objective-c++. This is because Objective-C metadata are extern "C"
and that needs the lang-specific-data to record it.
Richard Biener [Fri, 3 Jul 2026 11:18:43 +0000 (13:18 +0200)]
Make BB SLP costing more verbose
The following also dumps the costs for parts of the SLP graph that
are associated with another loop when there's one that was not
profitable. It also prints markers to where scalar/vector parts
start for easier debugging.
* tree-vect-slp.cc (vect_bb_vectorization_profitable_p): Make
profitabilty decision easier to debug.
tree-optimization: fold memset with length in [0, 1] to conditional store [PR102202]
The patch improves memset optimization when the length is known to be 0 or 1.
It uses Ranger information to recognize such cases, shrink-wraps the call on
the zero-length case and replaces the one-byte case with a direct byte store.
It also extends gimple_fold_builtin_memset to handle Ranger-proven singleton
lengths not just integer constants.
gcc/ChangeLog:
PR tree-optimization/102202
* tree-call-cdce.cc: Include "tree-ssanames.h", "gimple-fold.h".
(len_has_boolean_range_p): New function.
(can_shrink_wrap_len_p): New function.
(gen_zero_len_conditions): New function.
(shrink_wrap_len_call): New function.
(shrink_wrap_conditional_dead_built_in_calls): Dispatch to
shrink_wrap_one_memset_call for memset calls eligible for the [0, 1]
length transform, ahead of the generic LHS and range-test paths.
(pass_call_cdce::execute): Collect memset calls satisfying
can_shrink_wrap_memset_p as shrink-wrap candidates.
gcc/testsuite/ChangeLog:
PR tree-optimization/102202
* gcc.dg/pr102202-1.c: New test.
* gcc.dg/pr102202.c: New test.
* gcc.target/aarch64/pr100518.c: Modify to handle the warning.
Kyrylo Tkachov [Sun, 28 Jun 2026 17:48:28 +0000 (10:48 -0700)]
aarch64: Fold merging svextb/svexth/svextw with a ptrue to AND [PR120027]
For unsigned types the svextb, svexth and svextw intrinsics are plain
zero-extends, which the expander already lowers to a bitwise AND with a
constant mask. The any/don't-care (_x) form and the zeroing (_z) form with
an all-true predicate therefore compile to a single unpredicated AND, but
the merging (_m) form with an all-true predicate did not: it kept the
inactive argument and produced a predicated UXT. For example
ptrue p3.b, all
mov z31.d, z0.d
movprfx z0, z1
uxtb z0.d, p3/m, z31.d
ret
where a single
and z0.d, z0.d, #0xff
ret
is sufficient, because the all-true predicate makes the inactive operand
dead.
Give svext_bhw_impl a gimple fold that rewrites the unsigned merging form
with an all-true predicate to a BIT_AND_EXPR. Signed types (which use a
real sign-extend instruction), partial predicates and pfalse predicates are
left to the existing handling, as is the _x form, which the expander already
turns into an AND.
Bootstrapped and tested on aarch64-none-linux-gnu.
PR target/120027
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc (svext_bhw_impl::fold):
New member function. Fold the unsigned svextb/svexth/svextw
intrinsics to a bitwise AND when the merging form has an all-true
predicate.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/pr120027.c: New test.
Mikael Morin [Fri, 3 Jul 2026 09:16:04 +0000 (11:16 +0200)]
fortran: Don't reuse original descriptors for packed arrays [PR125998]
Don't try to reuse the original descriptor when creating a descriptor
for packed data in gfc_conv_array_parameter.
For a non-transposed array (anything but TRANSPOSE(VAR)) the original
descriptor was reused, and only the data was reset to the result of
packing. This is wrong because the original descriptor can be
non-contiguous, it can't be correct as the descriptor of packed data.
In the testcase, a dummy array associated with a transposed array actual
argument matches this case. Reusing the descriptor in that case would
cause the packed data to be used transposed, that is not in the normal
array element order. This change removes this case to always fallback
to what was previously the transposed case (see next).
For the transposed case (TRANSPOSE(VAR)), the full dimensions of the
original untransposed descriptor were reused (in other words, the
dimensions of VAR). This is wrong because packing doesn't change the
shape, so the packed array should have the same bounds as the bounds
of TRANSPOSE(VAR), not the same bounds as VAR. Reusing the strides of
an unpacked array for a packed array doesn't seem right either. This
change uses matching dimensions when copying from the original
descriptor, and only copies the lbound and ubound. The strides are
recalculated. And then the offset is recalculated as well (even though
I couldn't find a testcase where it made a difference).
PR fortran/97592
PR fortran/125998
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_array_parameter): Always create a new
descriptor. Copy lbound and ubound from the original descriptor
using matching dimension indexes. Recalculate stride and
offset.
Tamar Christina [Fri, 3 Jul 2026 08:15:01 +0000 (09:15 +0100)]
middle-end: Handle variable-length vector types in store_constructor
Currently vec_init does not support VLA vec_init and we instead fall back to
storing piecewise through memory.
However there's no defined semantics for this. This patch adds the semantics
that for VLA constructors the vector has to be cleared with zero before
piecewise being constructed from scalar elements. This means unspecified
elements are initialized to zero.
Without this patch
#include <arm_sve.h>
svint32_t __attribute__ ((noipa))
func_init4 (int32_t a, int32_t b, int32_t c)
{
svint32_t temp = {a, b, c};
return temp;
}
Jeff Law [Fri, 3 Jul 2026 04:36:47 +0000 (22:36 -0600)]
[RISC-V] Improve IOR/XOR synthesis for expensive constant cases
So much like the changes to add_synthesis, this adjusts xor/ior synthesis to
use riscv_integer_cost rather than riscv_const_insns. For those that didn't
read the add_synthesis patch, what happens is riscv_const_insns returns 0 for
constants requiring more than 3 insns to synthesize. So imagine if the
original constant had cost 5, it's bit inversion has cost 4. Both get
converted to "0" because they're over the maximal value and thus we can't
distinguish between them and we fail to use C' with XNOR, ORN or ANDN to
improve the resulting code.
The constants here were actually from the AND cases, but given the common ISA
capability and GCC structure I suspected the AND cases would apply to IOR/XOR,
and they do.
Tested without regression on riscv32-elf and riscv64-elf, also bootstrapped and
regression tested on the k3 and c920. I'll obviously be waiting for pre-commit
CI to do its thing before moving forward.
gcc/
* config/riscv/riscv.cc (synthesize_ior_xor): Use riscv_integer_cost
rather than riscv_const_insns.
Oleg Endo [Mon, 29 Jun 2026 04:38:30 +0000 (13:38 +0900)]
SH: Add movv2sf patterns.
The movv2sf is split into multiple movsf after RA. Without the extra patterns
LRA will get stuck in a reload cycle. At the moment the primary use cases of
V2SF mode is the FSCA insn.
gcc/ChangeLog:
PR target/55212
* config/sh/sh.md (movv2sf, movv2sf_i, unnamed splits): New patterns.
* config/sh/sh.cc (sh_hard_regno_mode_ok): Allow V2SF only in fp-regs.
(sh_max_mov_insn_displacement): Return 0 for any float mode.
Kaz Kojima [Sun, 6 Oct 2024 03:34:55 +0000 (12:34 +0900)]
SH: Reduce R0 live ranges around R0-constrained move insns for LRA
Some move and extend patterns can result in longer R0 live ranges and LRA has
trouble dealing with those. This tries to reduce the likelihood of failures by
using insn variants that use an explicit R0-clobber and splitting the move into
two insns. This tricks LRA into thinking it's using non-R0 for the operand.
gcc/ChangeLog:
PR target/55212
* config/sh/sh.md (extend<mode>si2_short_mem_disp_z): New
insn_and_split.
(extend<mode>si2): Use it for LRA.
(mov<mode>_store_mem_index, *mov<mode>_store_mem_index): New patterns.
(mov<mode>): Use it for LRA.
(movsf_ie_store_mem_index, movsf_ie_load_mem_index,
*movsf_ie_store_mem_inde, *movsf_ie_load_mem_index): New patterns.
(movsf): Use it for LRA.
Kaz Kojima [Tue, 23 Jun 2026 02:26:21 +0000 (11:26 +0900)]
SH: Adjust fp-reg related move insns to work with LRA
On SH fp move insns usually don't support displacement addressing modes.
Instead it needs to use additional match_scratch constraints which LRA has
trouble dealing with. Split movsf_ie_ra into several new patterns to remove
match_scratch as a mitigation. For movdf constant loads add a new sub-pattern.
Use a new pattern movsf_ie_rffr to handle movsf multiword subregs and disable
movsf_ie_ra for reg from/to subreg of SImode.
gcc/ChangeLog:
PR target/55212
* config/sh/predicates.md (pc_relative_load_operand): New predicate.
* config/sh/sh-protos.h (sh_movsf_ie_ra_split_p): Remove.
(sh_movsf_ie_y_split_p): New proto.
(sh_movsf_ie_subreg_multiword_p): New proto.
* config/sh/sh.cc: (sh_movsf_ie_ra_split_p): Remove.
(sh_movsf_ie_y_split_p): New function.
(sh_movsf_ie_subreg_multiword_p): New function.
(broken_move): Take movsf_ie_ra into account for fldi cases.
* config/sh/sh.md (movdf_i4_F_z): New insn.
(movdf): Use it when expanding.
(movsf_ie_ra): Use define_insn instead of define_insn_and_split.
Adjust alternatives.
(movsf_ie_rffr): New insn_and_split.
(movsf_ie_F_z, movsf_ie_Q_z, movsf_ie_y): New insns.
(movsf): Use new patterns when expanding.
Jerry DeLisle [Wed, 1 Jul 2026 21:03:23 +0000 (14:03 -0700)]
fortran: [PR126018] Fix rejects character function invocation as stop code
Expressions used in stop codes can be functions as long as they resolve to
integer or character.
PR fortran/126018
gcc/fortran/ChangeLog:
* match.cc (gfc_match_stopcode): Adjust the f2008 error check.If the
STOP code expr type is unknown, do not error. It will be checked in
gfc_resolve_code.
* resolve.cc (gfc_resolve_code): Add checks for EXEC_STOP and
EXEC_ERROR_STOP.
Pengfei Li [Mon, 29 Jun 2026 08:30:45 +0000 (08:30 +0000)]
AArch64: Cap suggested unroll factor for small known-niters loops
The AArch64 backend can suggest an unroll factor to the vectorizer in
order to expose more ILP. However, in some cases the suggested value is
larger than needed. For the test cases added by this patch, the AArch64
backend suggests an unroll factor of 4, but the loops only need 1 or 2
SVE vector iterations respectively to cover their 10 or 20 scalar
iterations.
This patch caps the suggested unroll factor with CEIL (niters, VF) for
small known-niters loops. CEIL is used rather than truncating division
so that the completely unrolled vector loop still covers all scalar
iterations. Reducing the unroll factor below the number of required
vector iterations could require a separate epilogue loop and lead to
worse code generation.
Bootstrapped and tested on aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Add a
loop_vec_info parameter.
(determine_suggested_unroll_factor): Cap the suggested unroll for
small-niters loops.
(aarch64_vector_costs::finish_cost): Pass loop_vinfo to
determine_suggested_unroll_factor.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vect-unroll-1.c: New test.
* gcc.target/aarch64/sve/vect-unroll-2.c: New test.
Jeff Law [Thu, 2 Jul 2026 17:41:09 +0000 (11:41 -0600)]
[RISC-V] Improve ADD synthesis
So testing passed for the V2 of this patch, but there were some minor issues I
felt needed to be addressed.
First in the new pattern, we can use %S to output the right value rather than
recomputing it ourselves. Second the test was tightend up slightly by adding
missing escapes. Finally a typos in the ChangeLog and a comment in bitmanip.md
was fixed. I didn't go through a full test cycle on those changes, but I did
test those on riscv32-elf and riscv64-elf with no regressions.
Attached is the final patch I'm pushing to the trunk.
--
So I instrumented the 3 ALU synthesis routines and built 502.gcc, then fed
those results into some python code that allows me to compare instruction
counts and total size for tests across LLVM and GCC. Naturally the idea was to
see if there were cases we should handle but were missing.
This fixes cases in add_synthesis. First we weren't utilizing add.uw, so
there's a relatively small set of cases where we can take the original constant
C, sign extend it from 32 to 64 bit resulting in C'. If C' is cheaper to
synthesize than C, then we can load up C' into a GPR, then use add.uw. This
(of course) requires the upper 32 bits of C to be zero and bit 31 to be on.
The second case is for INT_MIN. Adding INT_MIN to a register ultimately just
flips the uppermost bit and thus can be implemented with a binvi. Combine (of
course) collapses the bit inversion case back into arithmetic. Given the
result is just a binvi, this patch recognizes that special case as a new
pattern. That has a secondary effect of fixing the xfail for xor-synthesis-2.c
which was failing for precisely this reason.
While exploring the logical space it also came to light that we should be using
riscv_integer_cost rather than riscv_const_insns. The latter clamps at 3. So
if we had C with cost 5 and C' with cost 4 and we can use either, we really
want to use C', but didn't have a way to make that selection. Using
riscv_integer_cost resolves that *and* we generate less junk RTL since we don't
have to call GEN_INT so often. I haven't included testcase for that in this
patch, but definitely will on the ior/xor/and space.
At this time the synthesis side for addition looks good relative to LLVM, but
sometimes combine is going to undo its work. I checked every case from that
set where GCC has more instructions than LLVM and each and every one was a
scenario where combine+mvconst_internal undid the early synthesis work. So
just more reasons to keep pushing on that problem. I did add a special pattern
for the INT_MIN case. That was trivial and since it collapses to a single insn
with Zbs it seemed like the right thing to do in case combine discovers it from
some other path.
Both GCC and LLVM seem to be missing shNadd.uw support; after some head-banging
I did manage to characterize some cases where shNadd.uw was unique enough to be
useful. That exploration was ongoing when the latest test run fired up so that
support will land in a later patch.
I mentioned my evaluation also looked at code size differences. That brings in
general constant synthesis and there's a significant cluster of cases where
LLVM consistently does better (li|lui+shift sometimes encodes better than
lui+addi). That's already being tracked in bugzilla.
The other insight from this effort is that ADD, IOR, XOR are relatively minor
when compared to AND. I'm filtering out simm12 constants because those are
trivially handled. What was left was ~1k unique constants passed to AND. ~100
to ADD and ~100 to IOR/XOR. Point being the larger effort towards AND handling
seems more likely to pay dividends. Given the larger set of primitives for AND
it's no surprise we've already spent considerably more effort there.
Tested on riscv32-elf and riscv64-elf with no regressions. Bootstrapped and
regression tested on the K3 and c920 platforms. Waiting on pre-commit CI before
pushing.
gcc/
* config/riscv/bitmanip.md (xor_for_plus_minint): New pattern.
* config/riscv/riscv.cc (synthesize_add): Handle INT_MIN as
bit inversion. Add support for add.uw. Use riscv_integer_cost
rather than riscv_const_insns.
(synthesize_add_extended): Use riscv_integer_cost rather than
riscv_const_insns.
gcc/testsuite/
* gcc.target/riscv/add-synthesis-3.c: New test.
* gcc.target/riscv/xor-synthesis-2.c: No longer xfail.
Currently, we include all the built-ins like __builtin_fdimf32x in
the result of members_of. We probably should skip them. This patch
uses DECL_IS_UNDECLARED_BUILTIN so that we skip __builtin_abs but
include abs.
On ^^:: this reduces the # of elements from 2591 to 651.
AArch64: Enable SVE AES instructions in streaming mode with FEAT_SSVE_AES
FEAT_SSVE_AES makes the existing SVE AES instructions (AESE, AESD, AESMC,
AESIMC) available in Streaming SVE mode.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Check for SVE2 and SVE_AES directly.
* config/aarch64/aarch64-sve-builtins-sve2.def (REQUIRED_EXTENSIONS):
Make AES builtins streaming-compatible with SSVE AES.
* config/aarch64/aarch64-sve2.md: Use TARGET_SVE_AES instead of
TARGET_SVE2_AES.
* config/aarch64/aarch64.h (TARGET_SVE2_AES): Rename to...
(TARGET_SVE_AES): Add support for SSVE AES.
* config/aarch64/iterators.md: Use TARGET_SVE2 for VNx2DI
PMULL pair mode.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/aarch64-ssve.exp: Test SVE AES intrinsics
as streaming-compatible with +ssve-aes.
* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Use +ssve-aes for
streaming-compatible tests.
* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
* lib/target-supports.exp: Test aes intrinsics in streaming mode.
aarch64: Fix tls debuginfo missing location info [PR97344]
This patch fixes the missing debuginfo for the TLS variables by emitting
".xword %dtprel(symbol)" along with DW_AT_location in .debug_info section.
Support for the assembler directive ".xword %dtprel(symbol)" was recently
introduced. To prevent assembler errors when building GCC with older
versions of binutils, the patch adds a configure check that skips these
changes if the assembler does not support ".xword %dtprel(symbol)".
Related ABI changes are proposed here [1].
[1] https://github.com/ARM-software/abi-aa/pull/330
arm: make the SHAPE macro take a trailing semicolon
The SHAPE macro is used like a statement, but as currently written does
not take a trailing semicolon. This can be confusing for context-sensitive
editors as it looks like a statement, but isn't really. Fix this by
tweaking the macro to use a no-op trailing statement that now requires
a semicolon. The ATTRIBUTE_UNUSED isn't really needed but makes the
intent clearer.
gcc/ChangeLog:
* config/arm/arm-mve-builtins-shapes.cc (SHAPE): Add a
trailing statement that lacks a semicolon. Add that to all
existing uses.
aarch64: make the SHAPE macro take a trailing semicolon
The SHAPE macro is used like a statement, but as currently written does
not take a trailing semicolon. This can be confusing for context-sensitive
editors as it looks like a statement, but isn't really. Fix this by
tweaking the macro to use a no-op trailing statement that now requires
a semicolon. The ATTRIBUTE_UNUSED isn't really needed but makes the
intent clearer.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-shapes.cc (SHAPE): Add a
trailing statement that lacks a semicolon. Add that to all
existing uses.
gcc/configure.ac contains many inline checks if an in-tree gld is of a
version recent enough to enable some feature.
The checks are highly repetitive and hard to read, so this patch
replaces them by two shell functions, gcc_fn_gld_min_version and
gcc_fn_gld_elf_min_version. Both configure.ac and the scripts generated
by autoconf already heavily use shell functions, so they are no
portability problem.
Tested on x86_64-pc-linux-gnu as follows:
* Bootstrap with out-of-tree gas/gld.
* Non-bootstrap builds with
** the bundled gas/gld (2.44),
** in-tree binutils trunk (2.46.50), and
** fake in-tree builds where bfd/configure and ld/configure were hacked
to pose as gld 2.9 and 2.16 respecively.
Initially a full in-tree build was run. Afterwards, the gcc directory
was moved aside, recreated with make configure-gcc, and the resulting
versions of auto-host.h compared. Besides, the differences between the
various linker versions were as expected.
Mikael Morin [Thu, 2 Jul 2026 08:44:11 +0000 (10:44 +0200)]
fortran: array descriptor: Move debug info generation function [PR122521]
Move the gfc_get_descriptor_offsets_for_info function, which is used
to build debug info of array descriptors, to the trans-descriptor.cc
file.
PR fortran/122521
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_get_descriptor_offsets_for_info): Move
function ...
* trans-descriptor.cc (gfc_get_descriptor_offsets_for_info): ...
to this file.
* trans-array.h (gfc_get_descriptor_offsets_for_info): Move
declaration ...
* trans-descriptor.h (gfc_get_descriptor_offsets_for_info): ...
to this file.
* trans-types.cc: Include trans-descriptor.h.
Jonathan Wakely [Wed, 1 Jul 2026 18:46:42 +0000 (19:46 +0100)]
libstdc++: Refactor preprocessor condition for Windows symlinks
Only define windows_create_symlink when it will actually be functional,
and adjust its callers to not use it unless it's defined. This matches
the form of windows_read_symlink_handle and its caller.
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (windows_create_symlink): Adjust
preprocessor conditions to not define this at all unless
SYMBOLIC_LINK_FLAG_DIRECTORY is defined.
(fs::create_directory_symlink): Adjust preprocessor conditions
accordingly.
(fs::create_symlink): Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Wed, 1 Jul 2026 18:44:41 +0000 (19:44 +0100)]
libstdc++: Use atomic store for num_leap_seconds in tzdb.cc
Although NumLeapSeconds::set_locked is always called with list_mutex()
locked, if ATOMIC_INT_LOCK_FREE == 2 then readers ofthe variable will be
loading it without list_mutex() locked. We need to use an atomic store
even if the lock is held.
Also simplify the non-atomic version of NumLeapSeconds::set to just set
the variable instead of indirecting via set_locked.
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (_Node::NumLeapSeconds::set_locked)
[ATOMIC_INT_LOCK_FREE == 2]: Use atomic store.
(_Node::NumLeapSeconds::set) [ATOMIC_INT_LOCK_FREE != 2]: Set
value directly instead of calling set_locked.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Thomas Schwinge [Wed, 1 Jul 2026 21:24:46 +0000 (23:24 +0200)]
OpenMP: support for uses_allocators clause: Fix libgomp build for PTX < 4.1
Fix-up for commit 7a8f98b48104ccf10c6dceccc51b70de69288eaa
"OpenMP: support for uses_allocators clause", which regressed libgomp build for
PTX < 4.1 configurations:
In file included from [...]/libgomp/config/nvptx/allocator.c:48:
[...]/libgomp/config/nvptx/allocator.c:47:28: error: ‘__nvptx_lowlat_realloc’ defined but not used [-Werror=unused-function]
47 | #define BASIC_ALLOC_PREFIX __nvptx_lowlat
| ^~~~~~~~~~~~~~
[...]/libgomp/config/nvptx/../../basic-allocator.c:63:27: note: in definition of macro ‘fn1’
63 | #define fn1(prefix, name) prefix ## _ ## name
| ^~~~~~
[...]/libgomp/config/nvptx/../../basic-allocator.c:69:29: note: in expansion of macro ‘fn’
69 | #define basic_alloc_realloc fn(BASIC_ALLOC_PREFIX,realloc)
| ^~
[...]/libgomp/config/nvptx/../../basic-allocator.c:69:32: note: in expansion of macro ‘BASIC_ALLOC_PREFIX’
69 | #define basic_alloc_realloc fn(BASIC_ALLOC_PREFIX,realloc)
| ^~~~~~~~~~~~~~~~~~
[...]/libgomp/config/nvptx/../../basic-allocator.c:257:1: note: in expansion of macro ‘basic_alloc_realloc’
257 | basic_alloc_realloc (char *heap, void *addr, size_t oldsize,
| ^~~~~~~~~~~~~~~~~~~
[...]/libgomp/config/nvptx/allocator.c:47:28: error: ‘__nvptx_lowlat_calloc’ defined but not used [-Werror=unused-function]
47 | #define BASIC_ALLOC_PREFIX __nvptx_lowlat
| ^~~~~~~~~~~~~~
[...]/libgomp/config/nvptx/../../basic-allocator.c:63:27: note: in definition of macro ‘fn1’
63 | #define fn1(prefix, name) prefix ## _ ## name
| ^~~~~~
[...]/libgomp/config/nvptx/../../basic-allocator.c:67:28: note: in expansion of macro ‘fn’
67 | #define basic_alloc_calloc fn(BASIC_ALLOC_PREFIX,calloc)
| ^~
[...]/libgomp/config/nvptx/../../basic-allocator.c:67:31: note: in expansion of macro ‘BASIC_ALLOC_PREFIX’
67 | #define basic_alloc_calloc fn(BASIC_ALLOC_PREFIX,calloc)
| ^~~~~~~~~~~~~~~~~~
[...]/libgomp/config/nvptx/../../basic-allocator.c:167:1: note: in expansion of macro ‘basic_alloc_calloc’
167 | basic_alloc_calloc (char *heap, size_t size)
| ^~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[7]: *** [Makefile:819: allocator.lo] Error 1
If, for PTX < 4.1, we're not proving OpenMP low-latency memory, just disable
the whole '__nvptx_lowlat_pool' machinery.
Philipp Tomsich [Wed, 1 Jul 2026 10:19:30 +0000 (12:19 +0200)]
tree-optimization/124545 - VN: add inverse widening lookup for PLUS/MINUS
visit_nary_op canonicalises (T)(A + C) into (T)A + (T)C for its VN
lookup, but not the reverse -- so whether VN discovers (T)A + C ==
(T)(A + C) depends on which form it sees first. Add a match.pd rule
that rewrites (T)A +- CST into (T)(A +- CST') using the op! qualifier,
so the fold only fires when the narrow expression already has a value
number -- i.e. only inside VN via mprts_hook.
Restrict to TYPE_OVERFLOW_UNDEFINED inner types: for unsigned inner the
narrow op wraps mod 2^prec (defined) while the widened outer op does
not, changing the observed value (bitfld-5.c is the concrete miscompile
when the guard is loosened).
Use wi::min_precision (CST, SIGNED) rather than int_fits_type_p for the
fits-check, so sign-encoded small negatives (e.g. -1 as sizetype's
0xFFFF...FFFF) qualify.
PR tree-optimization/124545
gcc/ChangeLog:
* match.pd: Add (T)A +- CST -> (T)(A +- CST') for widening
conversions from a signed inner type with undefined overflow.
gcc/testsuite/ChangeLog:
* gcc.dg/pr124545.c: New test.
* gcc.dg/pr124545-2.c: New test.
On SH fp constant load special instructions 'fldi0' and 'fldi1' are only valid
for single-precision fp mode and thus depend on mode-switiching. LRA is
not aware of that (or any mode-switching constraints) and would emit such
constant loads in the wrong mode by changing fp-move related insn alternative
without validating fp-mode attributes.
This new target hook allows rejecting such potentially unsafe substitutions.
This patch has been proposed here
https://gcc.gnu.org/pipermail/gcc-patches/2026-March/709649.html
but was initially rejected, as it's just papering over the real problem.
Further discussion has clarified that this is a general issue in GCC, not only
limited to LRA. Everything that runs after the mode-switching pass
can make potentially unsafe insn transformations because nothing is validating
the insn mode requirements against the current cpu/fpu mode state.
After some reconsideration, this patch was approved
https://gcc.gnu.org/pipermail/gcc-patches/2026-June/722024.html
gcc/ChangeLog:
PR target/117182
PR target/55212
* target.def (cannot_substitute_const_equiv_p): New target hook.
* doc/tm.texi.in: Add it.
* lra-constraints.cc (get_equiv): Use it.
* config/sh/sh.cc (sh_cannot_substitute_const_equiv_p): Override it.
* doc/tm.texi: Re-generate.
Kaz Kojima [Tue, 24 Sep 2024 09:26:42 +0000 (18:26 +0900)]
SH: Pin input args to hard-regs via predicates for sfuncs
Some sfuncs uses hard reg as input and clobber its raw reg pattern. It
seems that LRA doesn't process this clobber pattern. Rewrite these
patterns so as to work with LRA.
gcc/ChangeLog:
PR target/55212
* config/sh/predicates.md (hard_reg_r0..r7): New predicates.
* config/sh/sh.md (udivsi3_i4, udivsi3_i4_single,
udivsi3_i1): Rewrite with match_operand and match_dup.
(block_lump_real, block_lump_real_i4): Ditto.
(udivsi3): Adjust for it.
* config/sh/sh-mem.cc (expand_block_move): Ditto.
Iain Sandoe [Sat, 27 Jun 2026 19:39:39 +0000 (20:39 +0100)]
Darwin: Updates to handling of NeXT metadata [PR124260].
Darwin-specific preparations for changes to handle the PR:
Allow metadata to reside in const_data.
Add ClassList to the cases where we want the symbols to be linker-visible.
This adds expanded comments on the meaning of the flag fields in the ImageInfo
metadata.
PR objc/124260
gcc/ChangeLog:
* config/darwin.cc (darwin_objc2_section): Also allow meta data
in const_data.
(darwin_label_is_anonymous_local_objc_name): Make ClassList linker-
visible.
(darwin_file_end): Update comments on the ImageInfo flags. Do not
claim we have signed pointers.
gcc/testsuite/ChangeLog:
* objc.dg/image-info.m: Test revised flags value for ABI-2.
Paul Thomas [Wed, 1 Jul 2026 16:14:25 +0000 (17:14 +0100)]
Fortran: Fix asan problems with PDT testcases [PR121972]
Co-authored-by: Jerry DeLisle <jvdelisle@gcc.gnu.org>
PR fortran/121972
gcc/fortran
* expr.cc (has_parameterized_comps): Return false if the DT
is neither a pdt_type nor has PDT components. Correct the
logic for a PDT component.
* trans-expr.cc (alloc_scalar_allocatable_for_assignment): Use
calloc for types with paramterized components as well as those
with allocatable components.
* trans-stmt.cc (gfc_trans_allocate): Merge the allocation of
parameterized components of PDTs and class PDTs into one block
If an allocate type_spec is present that has allocatable comps
where the class declared type does not, nullify the allocatable
components.
gcc/testsuite/
* gfortran.dg/asan/pdt_46.f03: Copy of original with tree dump
for counts of frees, mallocs and callocs removed.
* gfortran.dg/asan/pdt_77.f03: Ditto.
* gfortran.dg/pdt_46.f03: Calloc count added, corresponding to
reduction in mallocs..
* gfortran.dg/pdt_50.f03: Ditto.
Matthias Kretz [Thu, 5 Mar 2026 09:20:53 +0000 (10:20 +0100)]
libstdc++: Add std::complex to the [simd] vectorizable types
The implementation for [simd.bit] is trivial (not explicitly vectorized)
using calls to the scalar functions.
The division operator for vec<complex<T>> is not implemented yet
(instantiation hits an unconditional static_assert).
_M_abs() is not and cannot be implemented without [simd.math] (sqrt and
hypot).
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add bits/simd_bit.h, bits/simd_complex.h,
and bit/simd_math.
* include/Makefile.in: Regenerate.
* include/bits/simd_bit.h: New file.
* include/bits/simd_complex.h: New file.
* include/bits/simd_details.h (__complex_like): New concept.
(__vectorizable): Extend for complex types.
(_AbiVariant): Add _CxIleav and _CxCtgus variants.
(_ScalarAbi, _Abi): Add _S_is_cx_ileav and _S_is_cx_ctgus
members.
(_ArchTraits::_M_have_addsub): New.
(__native_abi): For __complex_like default to _CxIleav _Abi.
Derive the size from the native size for the complex's
value_type.
(__abi_rebind): Implement rebind to and from complex.
(__is_mask_conversion_explicit): Take _Cx* _AbiVariant into
account.
(__value_preserving_convertible_to): Also allow conversion to
complex.
(__simd_unsigned_integer): New.
(__simd_complex_value_type, __simd_complex): New.
* include/bits/simd_loadstore.h (unchecked_load): Use a cast to
the complex's value_type for converting loads to __complex_like.
* include/bits/simd_mask.h (basic_mask): Constrain partial
specializations to non-complex ABI tags.
(basic_mask::basic_mask): Add conversion from _CxIleav ABIs.
(_M_to_uint): Add option to interpret two mask elements as one
result bit. Make use of __x86_cvt_vecmask_to_bitmask.
(basic_mask::basic_mask): Add conversion from _CxCtgus and
_CxIleav masks.
(__select_impl): The complex specialization of basic_vec needs
_S_concat instead of _S_init.
(_M_reduce_count): Implement recursive reduction.
* include/bits/simd_mask_reductions.h (reduce_count): Fall back
to _M_reduce_count() for mask types without unary minus.
* include/bits/simd_math.h: New file.
* include/bits/simd_vec.h (basic_vec): Constrain partial
specializations to !__complex_like.
* include/bits/simd_x86.h (__x86_cvt_vecmask_to_bitmask): New.
* include/bits/vec_ops.h (__is_const_known): __complex_like
arguments are known if real and imag are known.
(_S_complex_negate_real, _S_complex_negate_imag, _S_addsub)
(_S_complex_real_is_const_known_zero)
(_S_complex_imag_is_const_known_zero): New.
* include/bits/version.def: Add simd_complex.
* include/bits/version.h: Regenerate.
* include/std/simd: Ask for simd_complex feature macro. Include
bits/simd_bit.h, bits/simd_complex.h, and bits/simd_math.h.
* testsuite/std/simd/arithmetic.cc: Enable tests for complex
which needs complex_init.h.
* testsuite/std/simd/complex_init.h: Helper for passing complex
values as template arguments.
* testsuite/std/simd/create_tests.h: Add float16_t, and three
complex types.
* testsuite/std/simd/mask2.cc: Add complex test types.
* testsuite/std/simd/simd_bit.cc: New test.
* testsuite/std/simd/simd_bit_expensive.cc: New test.
* testsuite/std/simd/stores.cc: Guard converting stores from
complex.
* testsuite/std/simd/test_setup.h (any_type_of): New.
(complex_like): New.
(bit_equal): Handle multi-reg and complex arguments.
(cx_isinf): New.
(equal_with_nan_and_inf_fixup): Handle complex types.
(is_const_known): Add std::complex overload.
(test_iota): Add support for std::complex.
* testsuite/std/simd/traits_common.cc: Add complex test types.
Test that instantiation of complete classes is well-formed.
* testsuite/std/simd/traits_impl.cc: Add complex test types.
Test __complex_like. Add a test that _CxIleav is dropped when
rebinding to the member of a _CxIleav mask.
Matthias Kretz [Fri, 20 Mar 2026 11:27:26 +0000 (12:27 +0100)]
libstdc++: Refactor _ScalarAbi<N> into _Abi<N, N>
Before this change _Ap::_S_is_bitmask would pick up false from
_ScalarAbi<N>. After __scalar_abi_tag now identifies any _Abi<N, N, V>,
where V can also identify bit-masks, the short-cut of setting
_S_use_bitmask to _Ap::_S_is_bitmask is wrong. It would be correct to
have it say _Ap::_S_is_bitmask && !__scalar_abi_tag<_Ap>. I decided to
implement the latter only in the _S_nreg == 1 specialization and have
the higher ups inherit the value from their vec/mask member. The
_S_is_bitmask bit is not erased for __scalar_abi_tag since it makes a
difference for __abi_rebind.
* include/bits/simd_details.h (_ScalarAbi): Remove.
(__scalar_abi_tag): Identify _Abi<N, N> as scalar now.
(__native_abi): Replace _ScalarAbi<1> with _Abi_t<1, 1, ...>.
(__abi_rebind): Refactor rebinding from/to __scalar_abi_tag.
* include/bits/simd_mask.h (_S_use_bitmask): Only true if
!_S_is_scalar.
(_M_and_neighbors, _M_or_neighbors): Add case for _S_is_scalar
where the and/or must be executed one step earlier.
(_M_reduce_min_index, _M_reduce_max_index): Delete dead code.
* include/bits/simd_vec.h (_S_use_bitmask): Inherit the value
from the first data member.
* testsuite/std/simd/traits_impl.cc: Adjust for the removal of
_ScalarAbi.
Icenowy Zheng [Tue, 30 Jun 2026 16:11:42 +0000 (00:11 +0800)]
RISC-V: Change initial value for fmin/fmax autovec reduce [PR126049]
Currently the auto-vectorization of C fmin()/fmax() uses
infinity/-infinity as the initial value for reduction, which introduces
bogus infinity values when iterating over an array with only NaNs.
As all C fmin()/fmax(), RV F/D fmin/fmax, RVV vfmin/vfmax and RVV
vfredmin/vfredmax are implementing the IEEE754 minimumNumber or
maximumNumber behavior, an initial value of NaN is more suitable than
infinity when reducing the vector (if the input vector is all NaN, the
result will still be NaN and if the input vector contains non-NaN
elements they will cover the initial NaN).
Change the initial value during reduction from corresponding inf to a
quiet NaN.
PR target/126049
gcc/ChangeLog:
* config/riscv/autovec.md: Change fmin/fmax reduction initial
value from inf/-inf to NaN for proper semantics of corresponding
C funtion.
Robin Dapp [Mon, 29 Jun 2026 20:13:30 +0000 (22:13 +0200)]
lra: Pass INVALID_REGNUM to dependent filter.
When the reference operand for a dependent filters is not yet chosen, we
currently just allow everything. This can lead to situations where we
fix a hard reg early, only for it to be rejected later. That's
unreasonable and cannot be salvaged by lra.
This patch gives the filter a chance to reject early in this situation
and documents that IVNALID_REGNUM can be passed to dependent filters.
It also adds Stefan's suggestion to look through subregs before
filtering.
gcc/ChangeLog:
* doc/md.texi: Document new behavior.
* lra-constraints.cc (get_dependent_filter): Call filter with
INVALID_REGNUM ref op instead of allowing everything.
Thomas Schwinge [Mon, 18 May 2026 18:10:12 +0000 (20:10 +0200)]
Remove HAVE_GNU_AS: Adjust for GCN assembler ('llvm-mc')
Fix-up for commit e08a7f620c037275e2c1c5940b56b536077cd98b
"Remove HAVE_GNU_AS", which didn't consider that GCC/GCN is using LLVM's
'llvm-mc' as its assembler. In 'gcc/configure', that one was correctly
detected as 'gas=no':
# Check if we are using GNU as if not already set.
if test -z "$gas"; then
if $gcc_cv_as --version 2>/dev/null | grep GNU > /dev/null; then
gas=yes
else
gas=no
fi
fi
..., so didn't get 'HAVE_GNU_AS' defined. Now we're not handling it
specially anymore, so the default 'ASM_V_SPEC' applies, which changes
'gcc/specs':
Jason Merrill [Wed, 1 Jul 2026 15:28:18 +0000 (11:28 -0400)]
c++: co_await and structured bindings [PR124584]
Here the internal guard variables that control whether to clean up the
structured binding variables were not living across the call to co_await
because they weren't promoted into the coroutine frame. The underlying
problem is using a TARGET_EXPR, which lives only for the full-expression, to
hold a value that needs to live as long as the variable itself. So
get_temp_regvar seems like a better fit. We also need to manually pushdecl
the guard variable so that it's visible to register_local_var_uses.
It might be better to use the wrap_temporary_cleanups mechanism for the main
variable, as we do for normal variables. But let's go with the simple fix
for now.
PR c++/124584
gcc/cp/ChangeLog:
* decl.cc (cp_finish_decl): Use get_temp_regvar for decomp guards.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/torture/tuple-decomp-pr124584.C: New test.
Jason Merrill [Wed, 1 Jul 2026 15:28:18 +0000 (11:28 -0400)]
c++: co_await and ->* [PR121094]
The use of TARGET_EXPR in get_member_function_from_ptrfunc since r13-9563
didn't match the expectations of flatten_await_stmt; the latter isn't
prepared to handle reuse of a TARGET_EXPR. Most places in the front-end
only reuse the TARGET_EXPR_SLOT, not the whole TARGET_EXPR, but that's
awkward for get_member_function_from_ptrfunc, so let's make it work.
PR c++/121094
PR c++/117259
gcc/cp/ChangeLog:
* coroutines.cc (replace_proxy): Handle TARGET_EXPR from.
(flatten_await_stmt): Pass it.
Jonathan Wakely [Wed, 1 Jul 2026 14:13:12 +0000 (15:13 +0100)]
libstdc++: Fix build failure in tzdb.cc
The changes in r17-2047-gb12fdd95251178 cause a bootstrap failure if
ATOMIC_POINTER_LOCK_FREE != 2 && ATOMIC_INT_LOCK_FREE == 2 is true for
the target:
.../tzdb.cc: In static member function ‘static const std::chrono::tzdb& std::chrono::tzdb_list::_Node::_S_replace_head(std::shared_ptr<std::chrono::tzdb_list::_Node>, std::shared_ptr<std::chrono::tzdb_list::_Node>)’:
.../tzdb.cc:1617:22: error: ‘struct std::chrono::tzdb_list::_Node::NumLeapSeconds’ has no member named ‘set_locked’
1617 | num_leap_seconds.set_locked(new_head_ptr->db.leap_seconds.size(), lock);
| ^~~~~~~~~~
This fix defines the 'set_atomically' and 'set_locked' functions
unconditionally, and renames the former to just 'set'. This fixes the
mismatch between the atomic pointer and atomic int conditions, as both
functions are available for both branches of the #if/#else in
_Node::_S_replace_head.
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (_Node::NumLeapSeconds::set_atomically):
Rename to set and define unconditionally.
(_Node::NumLeapSeconds::set_locked): Define unconditionally.
Tomasz Kamiński [Tue, 30 Jun 2026 07:46:07 +0000 (09:46 +0200)]
libstdc++: Cascade wall-time saves in lazy expansion seeding [PR124853]
When _M_get_sys_info seeds a Zone line by looking up the active rule
just before info.begin, the previous code interpreted each rule in
isolation against ri.offset() (the line's standard offset alone),
ignoring the running save accumulated by earlier rules in the same
year. For most zones this gives the right answer because the search
only matters when no rule has fired yet, but for zones whose rule
set has wall-time rules whose effective firing time depends on a
prior rule's save it produces wrong answers.
Canonical case: Europe/Paris around 1945. France's rules
R Fr 1945 o - Apr 2 2 2 M
R Fr 1945 o - Sep 16 3 0 -
both use plain wall time. In Paris's stdoff=1 frame, the September
rule's at_time of 03:00 wall translates to UT Sep 16 02:00 if no
prior save is applied, but to UT Sep 16 00:00 once the running save
of 2h from the April rule is taken into account. When seeding a
sys_info whose info.begin falls between those two values, the simple
search picks the April rule (save=2 → CEMT, total offset 3h) when
the correct answer is the September rule (save=0 → CET, total offset
1h). libstdc++ reports this as a sustained CEMT stretch where zic
and libc agree on CET.
To address above the finding algorithm, is now expanded to collect
three rule transitions around specified time t, while continuing to
ignore the save.
* curr_tran: transition happening before or at time t,
* prev_tran: transition preceding above transition,
* next_tran: transition happening after time t.
This collects sufficient information to adjust the start_time (if
Wall time is used) for curr_tran (save of prev_tran) and next_tran
(save of curr_tran). Assuming that applying save value does not
change order of transition (cascading save would be ill-defined
otherwise), after the adjustment the actual active rule is:
* next_tran.rule: if the adjustment pushed next_tran.when to
time before or at t, which happen for positive save (see
test_positive),
* prev_tran.rule: if adjustment pushed curr_tran.when to time
after time t, which happens for negative save (see test_negative),
* curr_tran.rule.
For the time at the start (Jan/1) or end of the year (December/31),
for each rule, in addition to transition in year of t, we check
transitions in previous or next year respectively (years in range
[first_year, last_year]). This handle rules whose firing (specified
in local time) crosses a year boundary due to a large stdoff or save.
One example is Pacific/Auckland's 1946 Jan 1 rule, in stdoff=12h,
fires at 1945-12-31 11:30 UT, see test_next_year.
The fallback "earliest STD rule" logic is preserved for the case
where no rule has fired yet, but is extracted to separate function.
This lookup is optimized, by searching the rules by name, from, and
save in that order, grouping std rules in given year together.
PR libstdc++/124853
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc
(time_zone::_M_get_sys_info): Extract code blocks to
separate functions, and invoke them.
(<unnamed>::find_active_rule): Modify algorithm to
handle cascading saves.
(<unnamed>::find_first_std): Simplified implementation
benefiting from reordering of rules.
(chrono::reload_tzdb): Sort rules by name, from and save.
* testsuite/std/time/time_zone/wall_cascade.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Co-authored-by: Álvaro Begué <alvaro.begue@gmail.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Álvaro Begué <alvaro.begue@gmail.com>
Roger Sayle [Wed, 1 Jul 2026 13:31:10 +0000 (14:31 +0100)]
i386: Handle (zero_extend:DI (mem:SI)) in x86's STV.
This patch enhances the i386 backend's stv2 pass to consider the
pattern (zero_extend:DI (mem:SI ...)) to be a candidate for conversion.
Loading an SImode value into an SSE register clears the rest of the
vector, i.e. effectively (v4si){ mem, 0, 0, 0 }, which can be used
to conveniently implement zero-extension to DImode, when performing
V2DImode Scalar-To-Vector (STV) conversion.
Consider the new test case:
long long y,z;
unsigned int p;
void foo()
{
long long t = p;
t ^= y;
z = t;
}
With -m32 -O2 -msse2 this currently generates:
foo: movl p, %eax
xorl %edx, %edx
movd %edx, %xmm1
movd %eax, %xmm0
punpckldq %xmm1, %xmm0
movq y, %xmm1
pxor %xmm1, %xmm0
movq %xmm0, z
ret
With this patch we now generate:
foo: movq y, %xmm1
movd p, %xmm0
pxor %xmm1, %xmm0
movq %xmm0, z
ret
2026-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain) <ZERO_EXTEND>:
Provide costs for the new transformation.
(convert_insn): Implement *zero_extendsidi2 using the backend's
vec_setv2di_0_zero_extendsi_1 pattern (i.e. movq mem, %xmm).
(general_scalar_to_vector_candidate_p): Consider the pattern
(zero_extend:DI (mem:SI ...)) to be a candidate for DImode STV.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-stv-6.c: New test case.
This can be simplified/canonicalized as (vec_merge (vec_merge a b m) a n)
is (vec_merge a b (m|~n)). This is easy to see as the first two operands
of a vec_merge may be swapped by inverting the third, i.e.
(vec_merge a b n) is equivalent to (vec_merge b a ~n), and the merging
one set of elements from a vector, followed by another set of elements
from the same vector can be done in a single step/instruction, i.e.
(vec_merge a (vec_merge a b m) n) = (vec_merge a b (m|n)).
With this transformation in simplify-rtx.cc, combine now reports:
After:
bar: vmovd %edi, %xmm1
vpbroadcastd %xmm1, %xmm0
ret
2026-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* simplify-rtx.cc (simplify_context::simplify_ternary_operation)
<case VEC_MERGE>: Simplify a vec_merge of a vec_merge with a
repeated operand.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx2-vpblendd128-3.c: New test case.
Pattern *extzv_<mode>_sll<clobbercc_or_nocc> matches a left shift
followed by anding a contiguous bitmask which is supposed to be
implemented by instruction RISBG. Vacated bits of a left shift are
zeroed whereas RISBG performs a left rotate, i.e., each bit shifted out
of the leftmost bit position is placed in the rightmost position of the
operand. Thus, those bits are not necessarily zero. However, the insn
does not adapt the bitmask in order to compensate for this. For
example, the pattern matches
r2 = (r3 << 1) & 255
which leads to
risbgn %r2,%r3,56,128+63,1
whereas expected is
risbgn %r2,%r3,56,128+62,1
Since the bitmask isn't adjusted the end bit includes the supposedly
vacated bit which for RISBG means that this equals the highest bit
instead of being zero always.
So far combine was gentle enough to adjust the bitmask which covered
this up. However, late_combine does not which is why this is being
exposed since r15-1579-g792f97b44ff.
A similar argument holds for *extzv_<mode>_srl<clobbercc_or_nocc>.
Fixed by adjusting the bitmasks for the output templates.
PR target/126054
gcc/ChangeLog:
* config/s390/s390.md: Fix
extzv_<mode>_{srl,sll}<clobbercc_or_nocc> by adjusting the
bitmasks in the output templates.
Jakub Jelinek [Wed, 1 Jul 2026 10:36:00 +0000 (12:36 +0200)]
aarch64: Remove some spurious semicolons
When building with GCC 8, I'm seeing
../../gcc/config/aarch64/aarch64-elf-metadata.cc:71:2: warning: extra ‘;’ [-Wpedantic]
../../gcc/config/aarch64/aarch64-sve-builtins-shapes.cc:5456:17: warning: extra ‘;’ [-Wpedantic]
../../gcc/config/aarch64/aarch64-sve-builtins-shapes.cc:5493:22: warning: extra ‘;’ [-Wpedantic]
../../gcc/config/aarch64/tuning_models/neoversev2.h:320:2: warning: extra ‘;’ [-Wpedantic]
warnings. The following patch fixes that.
Jakub Jelinek [Wed, 1 Jul 2026 10:33:57 +0000 (12:33 +0200)]
aarch64: Fix recent changes to be C++14 compatible
Last night I've noticed
../../gcc/config/aarch64/aarch64-neon-builtins-base.h:23:11: warning: nested namespace definitions only available with -std=c++17 or -std=gnu++17 [-Wpedantic]
../../gcc/config/aarch64/aarch64-neon-builtins-shapes.cc:128:11: warning: nested namespace definitions only available with -std=c++17 or -std=gnu++17 [-Wpedantic]
../../gcc/config/aarch64/aarch64-neon-builtins-shapes.h:23:11: warning: nested namespace definitions only available with -std=c++17 or -std=gnu++17 [-Wpedantic]
warnings. GCC 17 is still supposed to be buildable by C++14 compilers
(including GCC 5.4).
The following patch fixes that to use what the backend uses elsewhere.
The existing tests use inline asm to force operands into FPRs so that
Future VSX arithmetic instructions are generated.
Remove the operand forcing and let the tests check the compiler's
default instruction selection. This allows the tests to catch any
future changes in instruction selection caused by alternative
reordering.
Anlai Lu [Tue, 30 Jun 2026 12:49:02 +0000 (12:49 +0000)]
libstdc++: Add stream state tests for chrono operator<<
Add tests covering formatted output function semantics for the
chrono operator<< overloads. These verify that the new stack-buffer
implementation correctly preserves sentry construction (badbit check
and tied stream flush), width/fill padding with all three alignment
modes (right, left, internal), width reset after each output, and
locale-aware output. A weekday_indexed content test and a wchar_t
smoke test are included.
libstdc++-v3/ChangeLog:
* testsuite/std/time/ostream_insert.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Anlai Lu <agicy@qq.com>
Alfie Richards [Wed, 20 May 2026 14:50:25 +0000 (14:50 +0000)]
vect: Introduce LOOP_VINFO_IV_INCREMENT
Introduce LOOP_VINFO_IV_INCREMENT which stores the number of scalar
iterations an iteration of the vectorized loop has processed. Then
updates IV updates and reductions to use this value instead of VF.
Update IV update logic so the IV_INCREMENT can be loop-variant.
Simplify/remove SELECT_VL IV increment special cases by instead using
IV_INCREMENT.
As part of this change, the grouped load/store data pointer logic changes
from:
* tree-ssa-loop-manip.h: Add argument to create_iv.
* tree-ssa-loop-manip.cc (create_iv): Change step to not
necessarily be loop invariant.
* tree-vect-data-refs.cc (bump_vector_ptr): Remove logic for
updating use-def chain.
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly):
Change to use LOOP_VINFO_IV_INCREMENT instead of VF.
(vect_set_loop_condition_partial_vectors_avx512):
Change to use LOOP_VINFO_IV_INCREMENT instead of VF.
(vect_gen_vector_loop_niters): Update to handle
LOOP_VINFO_IV_INCREMENT.
(vect_get_loop_iv_increment): New function.
* tree-vect-loop.cc (vectorizable_induction): Update to use
LOOP_VINFO_IV_INCREMENT instead of VF, and remove SELECT_VL
logic.
(vect_update_ivs_after_vectorizer_for_early_breaks):
Update to use LOOP_VINFO_IV_INCREMENT over VF and move IV
update to vect_iv_increment_position.
(vect_transform_loop): Add initialization of
LOOP_VINFO_IV_INCREMENT.
* tree-vect-stmts.cc (vect_get_strided_load_store_ops):
Update to use dr_increment and dr_bump, remove SELECT_VL logic.
(vect_get_data_ptr_increment): Change to return the increment
needed to advance the pointer to the next iteration.
(vect_get_data_ptr_bump): New function.
(vectorizable_scan_store): Update to use vect_get_data_ptr_bump
and to remove SELECT_VL logic.
(vectorizable_store): Update to use vect_get_data_ptr_bump and
vect_get_data_ptr_increment.
(vectorizable_load): Update to use vect_get_data_ptr_bump and
vect_get_data_ptr_increment.
* tree-vectorizer.h (loop_vec_info): Add iv_increment field.
(LOOP_VINFO_IV_INCREMENT): New macro.
(LOOP_VINFO_IV_INCREMENT_INVARIANT_P): New macro.
(bump_vector_ptr): Remove incr_ptr argument.
(vect_get_loop_iv_increment): New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112399.c: Update for different
scheduling.
Jonathan Wakely [Sat, 6 Jun 2026 20:44:56 +0000 (21:44 +0100)]
libstdc++: Use realpath for /etc/localtime symlink [PR125467]
Although the systemd docs say that /etc/localtime should be a symlink to
one of the zoneinfo files, some systems make it a symlink to another
path, where that second path is a symlink to a zoneinfo file (e.g. if
/etc is mounted read-only then /etc/localtime can be a symlink to
another symlink on a writable disk, so that the system timezone can be
altered by re-pointing the symlink on the writable disk).
In that case, using readlink would only tell us the location of the
second symlink, not which zoneinfo file it points to. Therefore, we
would not be able to extract a valid time zone name from the path, and
chrono::current_zone() would fail.
To support multiple symlinks we could recursively keep resolving
symlinks with readlink until we reach a path from which we can extract a
zone name. Alternatively, we can just use realpath to resolve all
symlinks to a physical file (which is what HowardHinnant/date does).
This means we only need one system call and don't need the extra
complexity of calling readlink in a loop.
The realpath system call also removes redunant slashes, so we can remove
the code that did that manually.
The possible downsides of this approach that I'm aware of are:
- When /etc/localtime is a symlink to /invalid/Europe/London but that
file doesn't exist. With the previous implementation we would have
resolved that symlink to the zone "Europe/London" as long as that name
is known to the current chrono::tzdb object. With this change, we
won't get a valid zone name and current_zone() will fail. I'm not sure
how realistic this case is. It might be plausible if libstdc++ is
using the embedded static copy of tzdata.zi and there are no zoneinfo
files on disk at all. In that case the system might still use
/etc/localtime to name a zone, even though the symlink is dangling.
We could fall back to filesystem::weakly_canonical for this case, but
this patch leaves that for a future change, if it turns out to be
needed by any users.
- When /etc/localtime is a symlink to /usr/share/zoneinfo/Foo/Bar where
"Foo/Bar" is a valid zone in the chrono::tzdb object, but the Bar file
is another symlink to ./Baz where "Foo/Bar" is also a valid zone.
With the previous implementation current_zone() would have returned
the "Foo/Bar" zone. With this change it would return "Foo/Baz". I
don't think it's realistic to have two zones which are distinct zones
(not a Zone and a Link to it) but where one of them is defined on-disk
using a symlink to the other.
libstdc++-v3/ChangeLog:
PR libstdc++/125467
* src/c++20/tzdb.cc (tzdb::current_zone): Use realpath to
resolve the /etc/localtime symlink instead of readlink.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Mon, 22 Jun 2026 13:41:02 +0000 (14:41 +0100)]
libstdc++: Remove dependency on std::atomic<unsigned> in tzdb.cc
My r17-471-ge79f0f818c0e42 change to optimize handling of leap seconds
introduced a hard dependency on std::atomic<unsigned>, which causes
problems for targets without atomic word operations, like Cortex-M0.:
https://gcc.gnu.org/pipermail/gcc-patches/2026-June/719704.html
This patch replaces the num_leap_seconds variable with a struct which
decides whether to use std::atomic_ref<unsigned> or perform all accesses
while holding a lock on the pre-existing mutex used for the tzdb_list
singleton.
The workaround is a bit ugly, because it assumes that there is only one
caller of num_leap_seconds.set and that the list_mutex() is locked by
that caller iff the tzdb_list doesn't use atomic<shared_ptr<>>. To make
the assumption explicit, there are two different functions used to
update the value, depending on whether the mutex is used or not.
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (_Node::NumLeapSeconds): New class.
(_Node::num_leap_seconds): New static variable.
(num_leap_seconds): Remove.
(__detail::__recent_leap_second_info): Replace uses of
num_leap_seconds with _Node::num_leap_seconds.
(_Node::_S_replace_head): Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Yi Chen [Tue, 30 Jun 2026 10:26:09 +0000 (10:26 +0000)]
libstdc++: Add Hygon x86 RNG instructions support to std::random_device
Adapt std::random_device to use the Hygon x86 hardware RNG instructions
RDRAND and RDSEED, bringing Hygon C86 platforms to parity with Intel and AMD.
Prior to this change Hygon was absent from the CPUID vendor allowlists that
gate RDRAND/RDSEED in random_device::_M_init, so std::random_device fell
through to a software RNG.
Hygon C86 CPUs report the HygonGenuine vendor signature (CPUID leaf 0
ebx 0x6f677948) and support both RDRAND and RDSEED. Add signature_HYGON_ebx
to the RDSEED and RDRAND vendor allowlists in random_device::_M_init, matching
the existing ebx-only Intel/AMD pattern. With this, the explicit "rdseed" and
"rdrand" token paths select sources on Hygon in exactly the same order as on
Intel/AMD.
For the default device (the "default" token, which == any) Hygon prefers RDRAND,
the higher-throughput instruction, suited to bulk generation; RDSEED provides
higher-entropy seed material at lower throughput and remains fully accessible
via the explicit "rdseed" token and the "hw"/"hardware" tokens. Choosing RDRAND
for the Hygon default therefore favors throughput for the common case while
preserving RDSEED access for security-sensitive callers, and the entropy()
guarantee is unchanged (32 bits per call for both paths). The default policy is
handled by a dedicated check at the top of _M_init, kept separate from the
explicit-token paths so their selection order stays identical to Intel/AMD.
The existing RDSEED-first default policy is retained for Intel and AMD; their
behavior is entirely unchanged.
Tested on Hygon C86 7185: random_device("default"), ("rdrand") and ("rdseed")
all succeed; entropy() returns 32 for each; the existing 94087.cc test, which
exercises the default-token path, passes on Hygon.
* src/c++11/random.cc (random_device::_M_init): Add signature_HYGON_ebx to
the RDSEED and RDRAND CPUID vendor allowlists. For the default device on
Hygon, prefer RDRAND via a dedicated check at the top of the function.
The function attempts to provide a topological sorting of expressions
in a bitmap set but it fails short of that because we have no knowledge
of the entries of the value graph. The following makes sure to first
compute those. This should reduce the amount of iteration we do.
PR tree-optimization/125040
* tree-ssa-pre.cc (sorted_array_from_bitmap_set): Compute
the set of entries to the value graph before building the
final sorted expression set.
Like the gas case, after the removal of HAVE_GNU_LD and the gnu_ld
variable, the --with-gnu-ld option has become unnecessary, too, so this
patch removes it together with the gnu_ld_flag variable.
Given that the vast majority of configurations use GNU ld or a
compatible linker, the GNU ld annotation to the DEFAULT_LINKER configure
message carries little additional information and is also removed.
With the removal of HAVE_GNU_LD, the gnu_ld variable used in config.gcc
etc. has few uses left, so this patch removes or replaces them:
* In config.gcc:
hppa*64*-*-hpux11*: According to install.tex, this configuration
requires the native HP linker, so the target_cpu_default setting for
gld is removed.
ia64*-*-elf*: This configuration is gld-only, so target_cpu_default is
set unconditionally.
mips*-*-*: After the removal of IRIX support, MIPS configurations are
also gld-only, so target_cpu_default2 is set unconditionally.
* In configure.ac, checks of gnu_ld are changed to ld_flavor instead.
* The gnu_ld checks for the --version-script and -soname options are
replaced by checking the ld --help output, matching what is done for
-Bstatic/-Bdynamic.
The references to jit/Makefile.in are no longer true, thus removed.
* The gnu_ld check for the --demangle option isn't necessary since the
test already checks ld --help output.
Similar to HAVE_GNU_AS, there's only a single use of HAVE_GNU_LD left,
i.e. linker support for GNU style response files. Therefore this patch
replaces it with the result of a new configure test, HAVE_LD_AT_FILE.
Apart from that, there's a reference in alpha/vms.h. However, there's
no documentation on the support status of the alpha*-dec-*vms*
configuration. The last non-mechanical change to VMS files in
gcc/config dates back to 2014, so I've left that alone.
The removal of HAVE_GNU_AS and the gas variable in config.gcc etc. also
makes the --with-gnu-as configure option unnecessary. This patch
removes it together with gas_flag.
The latter is also used to annotate the DEFAULT_ASSEMBLER message in
configure.ac. Given that the vast majority of configurations uses gas,
this doesn't carry much information and is also removed.
Eric Botcazou [Wed, 1 Jul 2026 08:23:46 +0000 (10:23 +0200)]
PTA: Fix wrong optimization of conditional dynamic allocation
This is a regression present on mainline, 16, 15 and 14 branches introduced
by the fix for PR tree-optimization/112653 (PTA and return). What happens
is that DSE incorrectly eliminates a call to memcpy, whose destination is
obtained from (an equivalent of) malloc and is ultimately returned from the
function. But this happens only when the dynamic allocation is conditional.
The difference between the unconditional and conditional cases is:
ESCAPED_RETURN = { ESCAPED NONLOCAL HEAP(30) }
vs
ESCAPED_RETURN = { ANYTHING }
The fix is to apply in set_uids_in_ptset the same treatment to ANYTHING in
the escaped return case as in the escaped case.
gcc/
* tree-ssa-structalias.cc (set_uids_in_ptset): If ANYTHING is
present in the ESCAPED_RETURN solution, record that the global
solution has an escaped heap if FROM contains a heap variable.
gcc/testsuite/
* gnat.dg/opt109.adb: New test.
* gnat.dg/opt109_pkg.ads, gnat.dg/opt109_pkg.adb: New helper.