Martin Jambor [Thu, 17 Jul 2025 16:28:33 +0000 (18:28 +0200)]
tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
Jeff Law [Fri, 12 Sep 2025 22:08:38 +0000 (16:08 -0600)]
Fix latent LRA bug
Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug
in LRA.
We lazily allocate/reallocate the ira_reg_equiv structure and when we do
(re)allocation we'll over-allocate and zero-fill so that we don't have to
actually allocate and relocate the data so often.
In the case exposed by Shreya's work we had N requested entries at the last
rellocation step. We actually allocate N+M entries. During LRA we allocate
enough new pseudos and thus have N+M+1 pseudos.
In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read
past the allocated part of the array and get back junk which we use and
depending on the precise contents we fault in various fun and interesting ways.
We could either arrange to re-allocate ira_reg_equiv again on some path through
LRA (possibly in get_equiv itself). We could also just insert the bounds check
in get_equiv like is done elsewhere in LRA. Vlad indicated no strong
preference in an email last week.
So this just adds the bounds check in a manner similar to what's done elsewhere
in LRA. Bootstrapped and regression tested on x86_64 as well as RISC-V with
Shreya's work enabled and regtested across the various embedded targets.
gcc/
* lra-constraints.cc (get_equiv): Bounds check before accessing
data in ira_reg_equiv.
There are at least two cases where tree-switch-conversion leads
to unpleasant resource allocation:
PR49857
The lookup table lives in RAM. This is the case for all
devices that locate .rodata in RAM, which is for almost
all AVR devices.
PR81540
Code is bloated for 64-bit inputs.
As far as PR49857 is concerned, a target hook that may add an
address-space qualifier to the lookup table is the obvious
solution, though a respective patch has always been rejected by
global maintainers for non-technical reasons.
This also covers bad_function_call::what from C++11.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2011.xml: Add entry for bad_function_call.
* doc/xml/manual/status_cxx2017.xml: Add entries for bad_any_cast
and nullptr_t output. Update entry for sf.cmath. Fix stable name for
mem.res.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 39d7c4d42a764a86644198a517f58a94f467cdbd)
Tomasz Kamiński [Fri, 5 Sep 2025 11:16:40 +0000 (13:16 +0200)]
libstdc++: Document missing implementation defined behavior for std::filesystem.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate the file.
* doc/xml/manual/status_cxx2017.xml: Addd more entires.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit d6c370b8e96d43448537276d91c2b33fedb9754a)
Jonathan Wakely [Tue, 2 Sep 2025 21:30:46 +0000 (22:30 +0100)]
libstdc++: Make CTAD ignore pair(const T1&, const T2&) constructor [PR110853]
For the pair(T1, T2) explicit deduction type to decay its arguments as
intended, we need the pair(const T1&, const T2&) constructor to not be
used for CTAD. Otherwise we try to instantiate pair<T1, T2> without
decaying, which is ill-formed for function lvalues.
Use std::type_identity_t<T1> to make the constructor unusable for an
implicit deduction guide.
libstdc++-v3/ChangeLog:
PR libstdc++/110853
* include/bits/stl_pair.h [C++20] (pair(const T1&, const T2&)):
Use std::type_identity_t<T1> for first parameter.
* testsuite/20_util/pair/cons/110853.cc: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 0bb0d1d2880d562298eeec8eee4ab4e8ba943260)
Jonathan Wakely [Mon, 1 Sep 2025 17:12:27 +0000 (18:12 +0100)]
libstdc++: Fix std::get<T> for std::pair with reference members [PR121745]
Make the std::get<T> overloads for rvalues use std::forward<T>(p.first)
not std::move(p.first), so that lvalue reference members are not
incorrectly converted to rvalues.
It might appear that std::move(p).first would also work, but the
language rules say that for std::pair<T&&, U> that would produce T&
rather than the expected T&& (see the discussion in P2445R1 §8.2).
Additional tests are added to verify all combinations of reference
members, value categories, and const-qualification.
libstdc++-v3/ChangeLog:
PR libstdc++/121745
* include/bits/stl_pair.h (get): Use forward instead of move in
std::get<T> overloads for rvalue pairs.
* testsuite/20_util/pair/astuple/get_by_type.cc: Check all value
categories and cv-qualification.
aarch64: Fix mode mismatch when building a predicate [PR121118]
This PR is about a case where we used aarch64_expand_sve_const_pred_trn
to combine two predicates, one of which was constructing using
aarch64_sve_move_pred_via_while. The former requires the inputs
to have mode VNx16BI, but the latter returned VNx8BI for a .H
WHILELO.
The proper fix, used on trunk, is to make the pattern emitted by
aarch64_sve_move_pred_via_while produce an VNx16BI for all element
sizes, since every bit of the result is significant. However,
that required some target-independent changes that are too invasive
to backport. This patch goes for the simpler (but less robust) approach
of using the original pattern and casting it to VNx16BI after the fact.
Since the WHILELO pattern is an unspec, the chances of something
optimising it in a way that changes the undefined bits of the output
should be very low, especially on a release branch. It is still a less
satisfactory fix though.
gcc/
PR target/121118
* config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while):
Return a VNx16BI predicate.
gcc/testsuite/
PR target/121118
* gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test.
Pengfei Li [Thu, 14 Aug 2025 13:59:48 +0000 (13:59 +0000)]
AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE
gather/scatter with type widening/narrowing and vector-plus-immediate
addressing. The bug leads to below "immediate offset out of range"
errors during assembly, eventually causing compilation failures.
/tmp/ccsVqBp1.s: Assembler messages:
/tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]'
Current RTL patterns for such instructions incorrectly use vgw or vgd
constraints for the immediate operand, base on the vector element type
in Z registers (zN.s or zN.d). However, for gather/scatter with type
conversions, the immediate range for vector-plus-immediate addressing is
determined by the element type in memory, which differs from that in
vector registers. Using the wrong constraint can produce out-of-range
offset values that cannot be encoded in the instruction.
This patch corrects the constraints used in these patterns. A test case
that reproduces the issue is also included.
Bootstrapped and regression-tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/121449
* config/aarch64/aarch64-sve.md
(mask_gather_load<mode><v_int_container>): Use vg<Vesize>
constraints for alternatives with immediate offset.
(mask_scatter_store<mode><v_int_container>): Likewise.
gcc/testsuite/ChangeLog:
PR target/121449
* g++.target/aarch64/sve/pr121449.C: New test.
The previous patch for PR121294 handled svtrn1/2, svuzp1/2, and svzip1/2.
This one extends it to handle svrev intrinsics, where the same kind of
wrong code can be generated.
gcc/
PR target/121294
* config/aarch64/aarch64.md (UNSPEC_REV_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_rev<mode>_acle)
(*aarch64_sve_rev<mode>_acle): New patterns.
* config/aarch64/aarch64-sve-builtins-base.cc
(svrev_impl::expand): Use the new patterns for boolean svrev.
gcc/testsuite/
PR target/121294
* gcc.target/aarch64/sve/acle/general/rev_2.c: New test.
aarch64: Use VNx16BI for more permutations [PR121294]
The patterns for the predicate forms of svtrn1/2, svuzp1/2,
and svzip1/2 are shared with aarch64_vectorize_vec_perm_const.
The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI
respectively. Thus, for all four element widths, there is one
significant bit per element, for both the inputs and the output.
That's appropriate for aarch64_vectorize_vec_perm_const but not
for the ACLE intrinsics, where every bit of the output is
significant, and where every bit of the selected input elements
is therefore also significant. The current expansion can lead
the optimisers to simplify inputs by changing the upper bits
of the input elements (since the current patterns claim that
those bits don't matter), which in turn leads to wrong code.
The ACLE expansion should operate on VNx16BI instead, for all
element widths.
There was already a pattern for a VNx16BI-only form of TRN1, for
constructing certain predicate constants. The patch generalises it to
handle the other five permutations as well. For the reasons given in
the comments, this is done by making the permutation unspec an operand
to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing
unspecs, and rather than adding a new unspec for each permutation.
gcc/
PR target/121294
* config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete.
(UNSPEC_PERMUTE_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>):
Replace with...
(@aarch64_sve_<perm_insn><mode>_acle)
(*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns.
* config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn):
Update accordingly.
* config/aarch64/aarch64-sve-builtins-functions.h
(binary_permute::expand): Use the new _acle patterns for
predicate operations.
H.J. Lu [Thu, 24 Jul 2025 14:38:13 +0000 (07:38 -0700)]
x86: Disallow -mtls-dialect=gnu with no_caller_saved_registers
__tls_get_addr doesn't preserve vector registers. When a function
with no_caller_saved_registers attribute calls __tls_get_addr, YMM
and ZMM registers will be clobbered. Issue an error and suggest
-mtls-dialect=gnu2 in this case.
gcc/
PR target/121208
* config/i386/i386.cc (ix86_tls_get_addr): Issue an error for
-mtls-dialect=gnu with no_caller_saved_registers attribute and
suggest -mtls-dialect=gnu2.
Patrick Palka [Mon, 4 Aug 2025 20:43:33 +0000 (16:43 -0400)]
c++: constexpr evaluation of abi::__dynamic_cast [PR120620]
r13-3299 changed our internal declaration of __dynamic_cast to reside
inside the abi/__cxxabiv1:: namespace instead of the global namespace,
matching the real declaration. This inadvertently made us now attempt
constexpr evaluation of user-written calls to abi::__dynamic_cast since
cxx_dynamic_cast_fn_p now also returns true for them, but we're not
prepared to handle arbitrary calls to __dynamic_cast, and therefore ICE.
This patch restores cxx_dynamic_cast_fn_p to return true only for
synthesized calls to __dynamic_cast, which can be distinguished by
DECL_ARTIFICIAL, since apparently the synthesized declaration of
__dynamic_cast doesn't get merged with the actual declaration.
PR c++/120620
gcc/cp/ChangeLog:
* constexpr.cc (cxx_dynamic_cast_fn_p): Return true only
for synthesized __dynamic_cast.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/constexpr-dynamic19.C: New test.
* g++.dg/cpp2a/constexpr-dynamic1a.C: New test.
testsuite: Fix gcc.target/powerpc/vsx-builtin-7.c test [PR119382]
The test vsx-builtin-7.c failed on powerpc64le-linux due to Identical
Code Folding (ICF) merging the functions insert_di_0_v2 and insert_di_0.
This behavior was introduced by commit r15-7961-gdc47161c1f32c3, which
enhanced alias analysis in ao_compare::compare_ao_refs, enabling the
compiler to identify and optimize structurally identical functions. As a
result, the compiler replaced insert_di_0_v2 with a tail call to
insert_di_0, altering the expected test behavior.
This patch adds -fno-ipa-icf to the test's dg-options to disable ICF,
avoiding function merging and ensuring the test executes correctly.
disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for
-Oz. But for legacy integer registers, the former is 4 bytes and the
latter is 3 bytes. Enable such transformation for -Oz.
gcc/
PR target/120427
* config/i386/i386.md (peephole2): Transform "movq $-1,reg" to
"pushq $-1; popq reg" for -Oz if reg is a legacy integer register.
gcc/testsuite/
PR target/120427
* gcc.target/i386/pr120427-5.c: New test.
The termio ioctls are no longer used after commit 59978b21ad9c
("[sanitizer_common] Remove interceptors for deprecated struct termio
(#137403)"), remove them. Fixes this build error:
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:765:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
765 | unsigned IOCTL_TCGETA = TCGETA;
| ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:769:27: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
769 | unsigned IOCTL_TCSETA = TCSETA;
| ^~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:770:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
770 | unsigned IOCTL_TCSETAF = TCSETAF;
| ^~~~~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cpp:771:28: error: invalid application of ‘sizeof’ to incomplete type ‘__sanitizer::termio’
771 | unsigned IOCTL_TCSETAW = TCSETAW;
| ^~~~~~~
x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
added "*mov<mode>_and" and extended "*mov<mode>_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz. But the new pattern:
aren't guarded for -Oz. As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.
1. Change *mov<mode>_and" to define_insn_and_split and split it to
"mov $0,mem" if not -Oz.
2. Change "*mov<mode>_or" to define_insn_and_split and split it to
"mov $-1,mem" if not -Oz.
3. Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it
should be transformed to "or $-1,reg".
gcc/
PR target/120427
* config/i386/i386.md (*mov<mode>_and): Changed to
define_insn_and_split. Split it to "mov $0,mem" if not -Oz.
(*mov<mode>_or): Changed to define_insn_and_split. Split it
to "mov $-1,mem" if not -Oz.
(peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg"
for -Oz since it will be transformed to "or $-1,reg".
gcc/testsuite/
PR target/120427
* gcc.target/i386/cold-attribute-4.c: Compile with -Oz.
* gcc.target/i386/pr120427-1.c: New test.
* gcc.target/i386/pr120427-2.c: Likewise.
* gcc.target/i386/pr120427-3.c: Likewise.
* gcc.target/i386/pr120427-4.c: Likewise.
H.J. Lu [Thu, 3 Jul 2025 02:54:39 +0000 (10:54 +0800)]
x86-64: Add RDI clobber to 64-bit dynamic TLS patterns
*tls_global_dynamic_64_largepic, *tls_local_dynamic_64_<mode> and
*tls_local_dynamic_base_64_largepic use RDI as the __tls_get_addr
argument. Add RDI clobber to these patterns to show it.
gcc/
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_local_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_largepic): Add
RDI clobber and use it to generate LEA.
(*tls_local_dynamic_64_<mode>): Likewise.
(*tls_local_dynamic_base_64_largepic): Likewise.
(@tls_local_dynamic_64_<mode>): Add a clobber.
gcc/testsuite/
PR target/120908
* gcc.target/i386/pr120908.c: New test.
H.J. Lu [Tue, 1 Jul 2025 09:17:06 +0000 (17:17 +0800)]
x86-64: Add RDI clobber to tls_global_dynamic_64 patterns
*tls_global_dynamic_64_<mode> uses RDI as the __tls_get_addr argument.
Add RDI clobber to tls_global_dynamic_64 patterns to show it.
PR target/120908
* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
gen_tls_global_dynamic_64.
* config/i386/i386.md (*tls_global_dynamic_64_<mode>): Add RDI
clobber and use it to generate LEA.
(@tls_global_dynamic_64_<mode>): Add a clobber.