Patrick Palka [Wed, 17 Sep 2025 01:00:50 +0000 (21:00 -0400)]
libstdc++: Explicitly pass -Wsystem-headers in tests that need it
When running libstdc++ tests using an installed gcc (as opposed to an
in-tree gcc), we naturally use system stdlib headers instead of the
in-tree headers. But warnings from within system headers are suppressed
by default, so tests that check for such warnings spuriously fail in such
a setup. This patch makes us compile such tests with -Wsystem-headers so
that they consistently pass.
Atomic support enhanced to fix existing atomic_compare_and_swapsi pattern
to handle side effects; new patterns atomic_fetch_op and atomic_test_and_set
added. As MicroBlaze has no QImode test/set instruction, use shift magic
to implement atomic_test_and_set.
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
Jeff Law [Fri, 12 Sep 2025 22:08:38 +0000 (16:08 -0600)]
Fix latent LRA bug
Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug
in LRA.
We lazily allocate/reallocate the ira_reg_equiv structure and when we do
(re)allocation we'll over-allocate and zero-fill so that we don't have to
actually allocate and relocate the data so often.
In the case exposed by Shreya's work we had N requested entries at the last
rellocation step. We actually allocate N+M entries. During LRA we allocate
enough new pseudos and thus have N+M+1 pseudos.
In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read
past the allocated part of the array and get back junk which we use and
depending on the precise contents we fault in various fun and interesting ways.
We could either arrange to re-allocate ira_reg_equiv again on some path through
LRA (possibly in get_equiv itself). We could also just insert the bounds check
in get_equiv like is done elsewhere in LRA. Vlad indicated no strong
preference in an email last week.
So this just adds the bounds check in a manner similar to what's done elsewhere
in LRA. Bootstrapped and regression tested on x86_64 as well as RISC-V with
Shreya's work enabled and regtested across the various embedded targets.
gcc/
* lra-constraints.cc (get_equiv): Bounds check before accessing
data in ira_reg_equiv.
aarch64: PR target/121749: Use correct predicate for narrowing shift amounts
With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narrowing shift instructions
are now represented with standard RTL and more merging optimisations occur.
This exposed a wrong predicate for the shift amount operand.
The shift amount is the number of bits of the narrow destination, not the input
sources.
Correct this by using the vn_mode attribute when specifying the predicate, which
exists for this purpose.
I've spotted a few more narrowing shift patterns that need the restriction, so
they are updated as well.
Bootstrapped and tested on aarch64-none-linux-gnu.
s390: Emulate vec_cmp{eq,gt,gtu} for 128-bit integers
Mode iterator V_HW enables V1TI for target VXE which means
vec_cmpv1tiv1ti becomes available which leads to an ICE since there is
no corresponding insn.
Fixed by emulating comparisons and enabling mode V1TI unconditionally
for V_HW. For the sake of symmetry, I also added TI mode to V_HW since
TF mode is already included. As a consequence the consumers of V_HW
vec_{splat,slb,sld,sldw,sldb,srdb,srab,srb,test_mask_int,test_mask}
also become available for 128-bit integers.
This fixes gcc.c-torture/execute/pr105613.c and gcc.dg/pr106063.c.
* gcc.target/s390/vector/vec-cmp-emu-1.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-2.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-3.c: New test.
There are at least two cases where tree-switch-conversion leads
to unpleasant resource allocation:
PR49857
The lookup table lives in RAM. This is the case for all
devices that locate .rodata in RAM, which is for almost
all AVR devices.
PR81540
Code is bloated for 64-bit inputs.
As far as PR49857 is concerned, a target hook that may add an
address-space qualifier to the lookup table is the obvious
solution, though a respective patch has always been rejected by
global maintainers for non-technical reasons.
This also covers bad_function_call::what from C++11.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2011.xml: Add entry for bad_function_call.
* doc/xml/manual/status_cxx2017.xml: Add entries for bad_any_cast
and nullptr_t output. Update entry for sf.cmath. Fix stable name for
mem.res.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 39d7c4d42a764a86644198a517f58a94f467cdbd)
Tomasz Kamiński [Fri, 5 Sep 2025 11:16:40 +0000 (13:16 +0200)]
libstdc++: Document missing implementation defined behavior for std::filesystem.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate the file.
* doc/xml/manual/status_cxx2017.xml: Addd more entires.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit d6c370b8e96d43448537276d91c2b33fedb9754a)
Jonathan Wakely [Tue, 2 Sep 2025 21:30:46 +0000 (22:30 +0100)]
libstdc++: Make CTAD ignore pair(const T1&, const T2&) constructor [PR110853]
For the pair(T1, T2) explicit deduction type to decay its arguments as
intended, we need the pair(const T1&, const T2&) constructor to not be
used for CTAD. Otherwise we try to instantiate pair<T1, T2> without
decaying, which is ill-formed for function lvalues.
Use std::type_identity_t<T1> to make the constructor unusable for an
implicit deduction guide.
libstdc++-v3/ChangeLog:
PR libstdc++/110853
* include/bits/stl_pair.h [C++20] (pair(const T1&, const T2&)):
Use std::type_identity_t<T1> for first parameter.
* testsuite/20_util/pair/cons/110853.cc: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 0bb0d1d2880d562298eeec8eee4ab4e8ba943260)
Jonathan Wakely [Mon, 1 Sep 2025 17:12:27 +0000 (18:12 +0100)]
libstdc++: Fix std::get<T> for std::pair with reference members [PR121745]
Make the std::get<T> overloads for rvalues use std::forward<T>(p.first)
not std::move(p.first), so that lvalue reference members are not
incorrectly converted to rvalues.
It might appear that std::move(p).first would also work, but the
language rules say that for std::pair<T&&, U> that would produce T&
rather than the expected T&& (see the discussion in P2445R1 §8.2).
Additional tests are added to verify all combinations of reference
members, value categories, and const-qualification.
libstdc++-v3/ChangeLog:
PR libstdc++/121745
* include/bits/stl_pair.h (get): Use forward instead of move in
std::get<T> overloads for rvalue pairs.
* testsuite/20_util/pair/astuple/get_by_type.cc: Check all value
categories and cv-qualification.
Jonathan Wakely [Wed, 24 Jul 2024 17:08:03 +0000 (18:08 +0100)]
libstdc++: Implement LWG 3836 for std::expected bool conversions
libstdc++-v3/ChangeLog:
* include/std/expected (expected): Constrain constructors to
prevent problematic bool conversions, as per LWG 3836.
* testsuite/20_util/expected/lwg3836.cc: New test.
Jonathan Wakely [Mon, 1 Sep 2025 21:22:20 +0000 (22:22 +0100)]
libstdc++: Fix -Wswitch warning in <regex>
This fixes a warning seen with -Wsystem-headers:
include/c++/14.3.0/bits/regex_compiler.h:191:11: warning: case value '0' not in enumerated type 'std::regex_constants::syntax_option_type' [-Wswitch]
191 | case _FlagT(0):
| ^~~~
There's no diagnostic on trunk since the flag_enum attribute was added
to the enum type in r15-3500-g1914ca8791ce4e.
libstdc++-v3/ChangeLog:
* include/bits/regex_compiler.h (_Compiler::_S_validate): Add
diagnostic pragma to disable -Wswitch warning.
Jonathan Wakely [Tue, 19 Aug 2025 17:02:53 +0000 (18:02 +0100)]
libstdc++: Check _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK with #if [PR121496]
The change in r14-905-g3b7cb33033fbe6 to disable the use of
pthread_mutex_clocklock when TSan is active assumed that the
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK macro was always checked with #if
rather than #ifdef, which was not true.
This makes the checks use #if consistently.
libstdc++-v3/ChangeLog:
PR libstdc++/121496
* include/std/mutex (__timed_mutex_impl::_M_try_wait_until):
Change preprocessor condition to use #if instead of #ifdef.
(recursive_timed_mutex::_M_clocklock): Likewise.
* testsuite/30_threads/timed_mutex/121496.cc: New test.
When I added this explicit specialization in r14-1433-gf150a084e25eaa I
used the wrong value for the number of mantissa digits (I used 112
instead of 113). Then when I refactored it in r14-1582-g6261d10521f9fd I
used the value calculated from the incorrect value (35 instead of 36).
Harald Anlauf [Fri, 27 Jun 2025 21:00:48 +0000 (23:00 +0200)]
Fortran: follow-up fix to checking of renamed-on-use interface name [PR120784]
Commit r16-1633 introduced a regression for imported interfaces that were
not renamed-on-use, since the related logic did not take into account that
the absence of renaming could be represented by an empty string.
Martin Jambor [Wed, 23 Jul 2025 09:22:33 +0000 (11:22 +0200)]
tree-sra: Avoid total SRA if there are incompat. aggregate accesses (PR119085)
We currently use the types encountered in the function body and not in
type declaration to perform total scalarization. Bug PR 119085
uncovered that we miss a check that when the same data is accessed
with aggregate types that those are actually compatible. Without it,
we can base total scalarization on a type that does not "cover" all
live data in a different part of the function. This patch adds the
check.
gcc/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* tree-sra.cc (sort_and_splice_var_accesses): Prevent total
scalarization if two incompatible aggregates access the same place.
gcc/testsuite/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* gcc.dg/tree-ssa/pr119085.c: New test.
Martin Jambor [Fri, 18 Jul 2025 10:42:11 +0000 (12:42 +0200)]
tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
aarch64: Fix mode mismatch when building a predicate [PR121118]
This PR is about a case where we used aarch64_expand_sve_const_pred_trn
to combine two predicates, one of which was constructing using
aarch64_sve_move_pred_via_while. The former requires the inputs
to have mode VNx16BI, but the latter returned VNx8BI for a .H
WHILELO.
The proper fix, used on trunk, is to make the pattern emitted by
aarch64_sve_move_pred_via_while produce an VNx16BI for all element
sizes, since every bit of the result is significant. However,
that required some target-independent changes that are too invasive
to backport. This patch goes for the simpler (but less robust) approach
of using the original pattern and casting it to VNx16BI after the fact.
Since the WHILELO pattern is an unspec, the chances of something
optimising it in a way that changes the undefined bits of the output
should be very low, especially on a release branch. It is still a less
satisfactory fix though.
gcc/
PR target/121118
* config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while):
Return a VNx16BI predicate.
gcc/testsuite/
PR target/121118
* gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test.
Pengfei Li [Thu, 7 Aug 2025 14:52:45 +0000 (14:52 +0000)]
AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE
gather/scatter with type widening/narrowing and vector-plus-immediate
addressing. The bug leads to below "immediate offset out of range"
errors during assembly, eventually causing compilation failures.
/tmp/ccsVqBp1.s: Assembler messages:
/tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]'
Current RTL patterns for such instructions incorrectly use vgw or vgd
constraints for the immediate operand, base on the vector element type
in Z registers (zN.s or zN.d). However, for gather/scatter with type
conversions, the immediate range for vector-plus-immediate addressing is
determined by the element type in memory, which differs from that in
vector registers. Using the wrong constraint can produce out-of-range
offset values that cannot be encoded in the instruction.
This patch corrects the constraints used in these patterns. A test case
that reproduces the issue is also included.
Bootstrapped and regression-tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/121449
* config/aarch64/aarch64-sve.md
(mask_gather_load<mode><v_int_container>): Use vg<Vesize>
constraints for alternatives with immediate offset.
(mask_scatter_store<mode><v_int_container>): Likewise.
gcc/testsuite/ChangeLog:
PR target/121449
* g++.target/aarch64/sve/pr121449.C: New test.
aarch64: Adapt unwinder to linux's SME signal behaviour
SME uses a lazy save system to manage ZA. The idea is that,
if a function with ZA state wants to call a "normal" function,
it can leave its state in ZA and instead set up a lazy save buffer.
If, unexpectedly, that normal function contains a nested use of ZA,
that nested use of ZA must commit the lazy save first.
This lazy save system uses a special system register called TPIDR2_EL0.
See:
The ABI specifies that, on entry to an exception handler, the following
things must be true:
* PSTATE.SM must be 0 (the processor must be in non-streaming mode)
* PSTATE.ZA must be 0 (ZA must be off)
* TPIDR2_EL0 must be 0 (there must be no uncommitted lazy save)
This is normally done by making _Unwind_RaiseException & friends
commit any lazy save before they unwind. This also has the side
effect of ensuring that TPIDR2_EL0 is never left pointing to a
lazy save buffer that has been unwound.
However, things get more complicated with signals. If:
(a) a signal is raised while ZA is dormant (that is, while there is an
uncommitted lazy save);
(b) the signal handler throws an exception; and
(c) that exception is caught outside the signal handler
something must ensure that the lazy save from (a) is committed.
This would be simple if the signal handler was entered with ZA and
TPIDR2_EL0 intact. However, for various good reasons that are out
of scope here, this is not done. Instead, Linux now clears both
TPIDR2_EL0 and PSTATE.ZA before entering a signal handler, see:
Therefore, it is the unwinder that must simulate a commit of the lazy
save from (a). It can do this by reading the previous values of
TPIDR2_EL0 and ZA from the sigcontext.
The SME-related sigcontext structures were only added to linux's
asm/sigcontext.h relatively recently and we can't rely on GCC being
built against such recent kernel header files. The patch therefore uses
defines relevant macros if they are not defined and provide types that
comply with ABI layout of the corresponding linux types.
The patch includes some ugly casting in an attempt to support big-endian
ILP32, even though SME on big-endian ILP32 linux should never be a thing.
We can remove it if we also remove ILP32 support from GCC.
gcc/testsuite/
* lib/target-supports.exp (add_options_for_aarch64_sme)
(check_effective_target_aarch64_sme_hw): New procedures.
* g++.target/aarch64/sme/sme_throw_1.C: New test.
* g++.target/aarch64/sme/sme_throw_2.C: Likewise.
libgcc/
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
If a signal was raised while there was an uncommitted lazy save,
commit the save as part of the unwind process.
aarch64: Mark SME functions as .variant_pcs [PR121414]
Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible
functions allow/require PSTATE.SM to be 1 on entry, so they need to
be treated as STO_AARCH64_VARIANT_PCS.
Similarly, functions that share ZA or ZT0 with their callers require
ZA to be active on entry, whereas the base PCS requires ZA to be
dormant or off. These functions too need to be marked as having
a variant PCS.
gcc/
PR target/121414
* config/aarch64/aarch64.cc (aarch64_is_variant_pcs): New function,
split out from...
(aarch64_asm_output_variant_pcs): ...here. Handle various types
of SME function type.
gcc/testsuite/
PR target/121414
* gcc.target/aarch64/sme/pr121414_1.c: New test.
The previous patch for PR121294 handled svtrn1/2, svuzp1/2, and svzip1/2.
This one extends it to handle svrev intrinsics, where the same kind of
wrong code can be generated.
gcc/
PR target/121294
* config/aarch64/aarch64.md (UNSPEC_REV_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_rev<mode>_acle)
(*aarch64_sve_rev<mode>_acle): New patterns.
* config/aarch64/aarch64-sve-builtins-base.cc
(svrev_impl::expand): Use the new patterns for boolean svrev.
gcc/testsuite/
PR target/121294
* gcc.target/aarch64/sve/acle/general/rev_2.c: New test.
aarch64: Use VNx16BI for more permutations [PR121294]
The patterns for the predicate forms of svtrn1/2, svuzp1/2,
and svzip1/2 are shared with aarch64_vectorize_vec_perm_const.
The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI
respectively. Thus, for all four element widths, there is one
significant bit per element, for both the inputs and the output.
That's appropriate for aarch64_vectorize_vec_perm_const but not
for the ACLE intrinsics, where every bit of the output is
significant, and where every bit of the selected input elements
is therefore also significant. The current expansion can lead
the optimisers to simplify inputs by changing the upper bits
of the input elements (since the current patterns claim that
those bits don't matter), which in turn leads to wrong code.
The ACLE expansion should operate on VNx16BI instead, for all
element widths.
There was already a pattern for a VNx16BI-only form of TRN1, for
constructing certain predicate constants. The patch generalises it to
handle the other five permutations as well. For the reasons given in
the comments, this is done by making the permutation unspec an operand
to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing
unspecs, and rather than adding a new unspec for each permutation.
gcc/
PR target/121294
* config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete.
(UNSPEC_PERMUTE_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>):
Replace with...
(@aarch64_sve_<perm_insn><mode>_acle)
(*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns.
* config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn):
Update accordingly.
* config/aarch64/aarch64-sve-builtins-functions.h
(binary_permute::expand): Use the new _acle patterns for
predicate operations.
H.J. Lu [Thu, 24 Jul 2025 14:38:13 +0000 (07:38 -0700)]
x86: Disallow -mtls-dialect=gnu with no_caller_saved_registers
__tls_get_addr doesn't preserve vector registers. When a function
with no_caller_saved_registers attribute calls __tls_get_addr, YMM
and ZMM registers will be clobbered. Issue an error and suggest
-mtls-dialect=gnu2 in this case.
gcc/
PR target/121208
* config/i386/i386.cc (ix86_tls_get_addr): Issue an error for
-mtls-dialect=gnu with no_caller_saved_registers attribute and
suggest -mtls-dialect=gnu2.
The rtx cost value defined by the target backend affects the
calculation of register pressure classes in the IRA, thus affecting
scheduling. This may cause program performance degradation.
For example, OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r.
This problem can be avoided by defining a set of register pressure
classes in the target backend instead of using the default IRA to
automatically calculate them.
gcc/ChangeLog:
PR target/120476
* config/loongarch/loongarch.cc
(loongarch_compute_pressure_classes): New function.
(TARGET_COMPUTE_PRESSURE_CLASSES): Define.
mengqinggang [Fri, 8 Aug 2025 08:22:59 +0000 (16:22 +0800)]
LoongArch: macro instead enum for base abi type
enum can't be used in #if.
For #if expression, identifiers that are not macros,
which are all considered to be the number zero.
This patch may fix https://sourceware.org/bugzilla/show_bug.cgi?id=32776.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (ABI_BASE_LP64D): New macro.
(ABI_BASE_LP64F): New macro.
(ABI_BASE_LP64S): New macro.
(N_ABI_BASE_TYPES): New macro.