Eric Botcazou [Mon, 20 Oct 2025 09:21:21 +0000 (11:21 +0200)]
Ada: Fix spurious warning for renaming of component of VFA record
This is a regression present on the mainline and all active branches: the
compiler gives a spurious "is not referenced" warning for the renaming of
a component of a Volatile_Full_Access record.
gcc/ada/
PR ada/107536
* exp_ch2.adb (Expand_Renaming): Mark the entity as referenced.
gcc/testsuite/
* gnat.dg/renaming18.adb: New test.
In this PR we have a reduction of a vector constructor, where the
type of the constructor is int16x8_t and the elements are int16x4_t;
i.e. it is representing a concatenation of two vectors.
This triggers a match.pd pattern which looks like it was written to
handle reductions of vector constructors where the elements of the ctor
are scalars, not vectors. There is no type check to enforce this
property, which leads to the pattern replacing a reduction to scalar
with an int16x4_t vector in this case, which of course is a type error,
leading to an invalid GIMPLE ICE.
This patch adds a type check to the pattern, only going ahead with the
transformation if the element type of the ctor matches that of the
reduction.
gcc/ChangeLog:
PR tree-optimization/121772
* match.pd: Add type check to reduc(ctor) pattern.
gcc/testsuite/ChangeLog:
PR tree-optimization/121772
* gcc.target/aarch64/torture/pr121772.c: New test.
Richard Biener [Tue, 14 May 2024 09:13:51 +0000 (11:13 +0200)]
tree-optimization/120156 - ICE in ptr_derefs_may_alias_p
This picks the ptr_derefs_may_alias_p fix from the PR99954 fix
which said: This makes us run into a latent issue in
ptr_deref_may_alias_decl_p when the pointer is something like &MEM[0].a
in which case we fail to handle non-SSA name pointers. Add code
similar to what we have in ptr_derefs_may_alias_p.
PR tree-optimization/120156
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.
So as Georg-Johann discusses in the BZ, reload_cse_move2add can generate
incorrect code when optimizing code with clobbers. Specifically in the
case where we try to optimize a sequence of 4 operations down to 3
operations we can reset INSN to the next instruction and continue the loop.
That skips the code to invalidate objects based on things like REG_INC
nodes, stack pushes and most importantly clobbers attached to the current
insn.
This patch factors all of the invalidation code used by reload_cse_move2add
into a new function and calls it at the appropriate time.
Georg-Johann has confirmed this patch fixes his avr bug and I've had it in
my tester over the weekend. It's bootstrapped and regression tested on
aarch64, m68k, sh4, alpha and hppa. It's also regression tested successfully
on a wide variety of other targets.
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
Martin Jambor [Wed, 23 Jul 2025 09:22:33 +0000 (11:22 +0200)]
tree-sra: Avoid total SRA if there are incompat. aggregate accesses (PR119085)
We currently use the types encountered in the function body and not in
type declaration to perform total scalarization. Bug PR 119085
uncovered that we miss a check that when the same data is accessed
with aggregate types that those are actually compatible. Without it,
we can base total scalarization on a type that does not "cover" all
live data in a different part of the function. This patch adds the
check.
gcc/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* tree-sra.cc (sort_and_splice_var_accesses): Prevent total
scalarization if two incompatible aggregates access the same place.
gcc/testsuite/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* gcc.dg/tree-ssa/pr119085.c: New test.
Martin Jambor [Thu, 17 Jul 2025 16:28:33 +0000 (18:28 +0200)]
tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
Jeff Law [Fri, 12 Sep 2025 22:08:38 +0000 (16:08 -0600)]
Fix latent LRA bug
Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug
in LRA.
We lazily allocate/reallocate the ira_reg_equiv structure and when we do
(re)allocation we'll over-allocate and zero-fill so that we don't have to
actually allocate and relocate the data so often.
In the case exposed by Shreya's work we had N requested entries at the last
rellocation step. We actually allocate N+M entries. During LRA we allocate
enough new pseudos and thus have N+M+1 pseudos.
In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read
past the allocated part of the array and get back junk which we use and
depending on the precise contents we fault in various fun and interesting ways.
We could either arrange to re-allocate ira_reg_equiv again on some path through
LRA (possibly in get_equiv itself). We could also just insert the bounds check
in get_equiv like is done elsewhere in LRA. Vlad indicated no strong
preference in an email last week.
So this just adds the bounds check in a manner similar to what's done elsewhere
in LRA. Bootstrapped and regression tested on x86_64 as well as RISC-V with
Shreya's work enabled and regtested across the various embedded targets.
gcc/
* lra-constraints.cc (get_equiv): Bounds check before accessing
data in ira_reg_equiv.
There are at least two cases where tree-switch-conversion leads
to unpleasant resource allocation:
PR49857
The lookup table lives in RAM. This is the case for all
devices that locate .rodata in RAM, which is for almost
all AVR devices.
PR81540
Code is bloated for 64-bit inputs.
As far as PR49857 is concerned, a target hook that may add an
address-space qualifier to the lookup table is the obvious
solution, though a respective patch has always been rejected by
global maintainers for non-technical reasons.
This also covers bad_function_call::what from C++11.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2011.xml: Add entry for bad_function_call.
* doc/xml/manual/status_cxx2017.xml: Add entries for bad_any_cast
and nullptr_t output. Update entry for sf.cmath. Fix stable name for
mem.res.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 39d7c4d42a764a86644198a517f58a94f467cdbd)
Tomasz Kamiński [Fri, 5 Sep 2025 11:16:40 +0000 (13:16 +0200)]
libstdc++: Document missing implementation defined behavior for std::filesystem.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate the file.
* doc/xml/manual/status_cxx2017.xml: Addd more entires.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit d6c370b8e96d43448537276d91c2b33fedb9754a)
Jonathan Wakely [Tue, 2 Sep 2025 21:30:46 +0000 (22:30 +0100)]
libstdc++: Make CTAD ignore pair(const T1&, const T2&) constructor [PR110853]
For the pair(T1, T2) explicit deduction type to decay its arguments as
intended, we need the pair(const T1&, const T2&) constructor to not be
used for CTAD. Otherwise we try to instantiate pair<T1, T2> without
decaying, which is ill-formed for function lvalues.
Use std::type_identity_t<T1> to make the constructor unusable for an
implicit deduction guide.
libstdc++-v3/ChangeLog:
PR libstdc++/110853
* include/bits/stl_pair.h [C++20] (pair(const T1&, const T2&)):
Use std::type_identity_t<T1> for first parameter.
* testsuite/20_util/pair/cons/110853.cc: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 0bb0d1d2880d562298eeec8eee4ab4e8ba943260)
Jonathan Wakely [Mon, 1 Sep 2025 17:12:27 +0000 (18:12 +0100)]
libstdc++: Fix std::get<T> for std::pair with reference members [PR121745]
Make the std::get<T> overloads for rvalues use std::forward<T>(p.first)
not std::move(p.first), so that lvalue reference members are not
incorrectly converted to rvalues.
It might appear that std::move(p).first would also work, but the
language rules say that for std::pair<T&&, U> that would produce T&
rather than the expected T&& (see the discussion in P2445R1 §8.2).
Additional tests are added to verify all combinations of reference
members, value categories, and const-qualification.
libstdc++-v3/ChangeLog:
PR libstdc++/121745
* include/bits/stl_pair.h (get): Use forward instead of move in
std::get<T> overloads for rvalue pairs.
* testsuite/20_util/pair/astuple/get_by_type.cc: Check all value
categories and cv-qualification.
aarch64: Fix mode mismatch when building a predicate [PR121118]
This PR is about a case where we used aarch64_expand_sve_const_pred_trn
to combine two predicates, one of which was constructing using
aarch64_sve_move_pred_via_while. The former requires the inputs
to have mode VNx16BI, but the latter returned VNx8BI for a .H
WHILELO.
The proper fix, used on trunk, is to make the pattern emitted by
aarch64_sve_move_pred_via_while produce an VNx16BI for all element
sizes, since every bit of the result is significant. However,
that required some target-independent changes that are too invasive
to backport. This patch goes for the simpler (but less robust) approach
of using the original pattern and casting it to VNx16BI after the fact.
Since the WHILELO pattern is an unspec, the chances of something
optimising it in a way that changes the undefined bits of the output
should be very low, especially on a release branch. It is still a less
satisfactory fix though.
gcc/
PR target/121118
* config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while):
Return a VNx16BI predicate.
gcc/testsuite/
PR target/121118
* gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test.
Pengfei Li [Thu, 14 Aug 2025 13:59:48 +0000 (13:59 +0000)]
AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE
gather/scatter with type widening/narrowing and vector-plus-immediate
addressing. The bug leads to below "immediate offset out of range"
errors during assembly, eventually causing compilation failures.
/tmp/ccsVqBp1.s: Assembler messages:
/tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]'
Current RTL patterns for such instructions incorrectly use vgw or vgd
constraints for the immediate operand, base on the vector element type
in Z registers (zN.s or zN.d). However, for gather/scatter with type
conversions, the immediate range for vector-plus-immediate addressing is
determined by the element type in memory, which differs from that in
vector registers. Using the wrong constraint can produce out-of-range
offset values that cannot be encoded in the instruction.
This patch corrects the constraints used in these patterns. A test case
that reproduces the issue is also included.
Bootstrapped and regression-tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/121449
* config/aarch64/aarch64-sve.md
(mask_gather_load<mode><v_int_container>): Use vg<Vesize>
constraints for alternatives with immediate offset.
(mask_scatter_store<mode><v_int_container>): Likewise.
gcc/testsuite/ChangeLog:
PR target/121449
* g++.target/aarch64/sve/pr121449.C: New test.
The previous patch for PR121294 handled svtrn1/2, svuzp1/2, and svzip1/2.
This one extends it to handle svrev intrinsics, where the same kind of
wrong code can be generated.
gcc/
PR target/121294
* config/aarch64/aarch64.md (UNSPEC_REV_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_rev<mode>_acle)
(*aarch64_sve_rev<mode>_acle): New patterns.
* config/aarch64/aarch64-sve-builtins-base.cc
(svrev_impl::expand): Use the new patterns for boolean svrev.
gcc/testsuite/
PR target/121294
* gcc.target/aarch64/sve/acle/general/rev_2.c: New test.
aarch64: Use VNx16BI for more permutations [PR121294]
The patterns for the predicate forms of svtrn1/2, svuzp1/2,
and svzip1/2 are shared with aarch64_vectorize_vec_perm_const.
The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI
respectively. Thus, for all four element widths, there is one
significant bit per element, for both the inputs and the output.
That's appropriate for aarch64_vectorize_vec_perm_const but not
for the ACLE intrinsics, where every bit of the output is
significant, and where every bit of the selected input elements
is therefore also significant. The current expansion can lead
the optimisers to simplify inputs by changing the upper bits
of the input elements (since the current patterns claim that
those bits don't matter), which in turn leads to wrong code.
The ACLE expansion should operate on VNx16BI instead, for all
element widths.
There was already a pattern for a VNx16BI-only form of TRN1, for
constructing certain predicate constants. The patch generalises it to
handle the other five permutations as well. For the reasons given in
the comments, this is done by making the permutation unspec an operand
to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing
unspecs, and rather than adding a new unspec for each permutation.
gcc/
PR target/121294
* config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete.
(UNSPEC_PERMUTE_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>):
Replace with...
(@aarch64_sve_<perm_insn><mode>_acle)
(*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns.
* config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn):
Update accordingly.
* config/aarch64/aarch64-sve-builtins-functions.h
(binary_permute::expand): Use the new _acle patterns for
predicate operations.
H.J. Lu [Thu, 24 Jul 2025 14:38:13 +0000 (07:38 -0700)]
x86: Disallow -mtls-dialect=gnu with no_caller_saved_registers
__tls_get_addr doesn't preserve vector registers. When a function
with no_caller_saved_registers attribute calls __tls_get_addr, YMM
and ZMM registers will be clobbered. Issue an error and suggest
-mtls-dialect=gnu2 in this case.
gcc/
PR target/121208
* config/i386/i386.cc (ix86_tls_get_addr): Issue an error for
-mtls-dialect=gnu with no_caller_saved_registers attribute and
suggest -mtls-dialect=gnu2.