Richard Biener [Mon, 14 Jul 2025 12:09:28 +0000 (14:09 +0200)]
tree-optimization/121059 - fixup loop mask query
When we opportunistically mask an operand of a AND with an already
available loop mask we need to query that set with the correct number
of masks we expect.
PR tree-optimization/121059
* tree-vect-stmts.cc (vectorizable_operation): Query
scalar_cond_masked_set with the correct number of masks.
Richard Biener [Mon, 7 Jul 2025 07:56:50 +0000 (09:56 +0200)]
tree-optimization/120817 - bogus DSE of .MASK_STORE
DSE used ao_ref_init_from_ptr_and_size for .MASK_STORE but
alias-analysis will use the specified size to disambiguate
against smaller objects. For .MASK_STORE we instead have to
make the access size unspecified but we can still constrain
the access extent based on the maximum size possible.
PR tree-optimization/120817
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Use
ao_ref_init_from_ptr_and_range with unknown size for
.MASK_STORE and .MASK_LEN_STORE.
LIU Hao [Sat, 25 Oct 2025 09:19:34 +0000 (17:19 +0800)]
x86-64: Use `movsxd` to perform SI-to-DI extension in Intel syntax
Although there's no possibility of ambiguity, Intel manual says the mnemonic
for DWORD-to-QWORD sign-extension operation should be MOVSXD. Some assemblers
(GNU AS, NASM) also overload MOVSX, but some others don't accept MOVSX (LLVM,
MASM, YASM in NASM mode) and require MOVSXD.
This mnemonic was introduced in r0-34259-g123bf9e3f4056d in 2001, and has not
been updated ever since.
gcc/ChangeLog:
PR target/119079
* config/i386/i386.md: Use `movsxd` to perform SI-to-DI extension in Intel
syntax.
Eric Botcazou [Mon, 20 Oct 2025 09:21:21 +0000 (11:21 +0200)]
Ada: Fix spurious warning for renaming of component of VFA record
This is a regression present on the mainline and all active branches: the
compiler gives a spurious "is not referenced" warning for the renaming of
a component of a Volatile_Full_Access record.
gcc/ada/
PR ada/107536
* exp_ch2.adb (Expand_Renaming): Mark the entity as referenced.
gcc/testsuite/
* gnat.dg/renaming18.adb: New test.
In this PR we have a reduction of a vector constructor, where the
type of the constructor is int16x8_t and the elements are int16x4_t;
i.e. it is representing a concatenation of two vectors.
This triggers a match.pd pattern which looks like it was written to
handle reductions of vector constructors where the elements of the ctor
are scalars, not vectors. There is no type check to enforce this
property, which leads to the pattern replacing a reduction to scalar
with an int16x4_t vector in this case, which of course is a type error,
leading to an invalid GIMPLE ICE.
This patch adds a type check to the pattern, only going ahead with the
transformation if the element type of the ctor matches that of the
reduction.
gcc/ChangeLog:
PR tree-optimization/121772
* match.pd: Add type check to reduc(ctor) pattern.
gcc/testsuite/ChangeLog:
PR tree-optimization/121772
* gcc.target/aarch64/torture/pr121772.c: New test.
Richard Biener [Tue, 14 May 2024 09:13:51 +0000 (11:13 +0200)]
tree-optimization/120156 - ICE in ptr_derefs_may_alias_p
This picks the ptr_derefs_may_alias_p fix from the PR99954 fix
which said: This makes us run into a latent issue in
ptr_deref_may_alias_decl_p when the pointer is something like &MEM[0].a
in which case we fail to handle non-SSA name pointers. Add code
similar to what we have in ptr_derefs_may_alias_p.
PR tree-optimization/120156
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.
So as Georg-Johann discusses in the BZ, reload_cse_move2add can generate
incorrect code when optimizing code with clobbers. Specifically in the
case where we try to optimize a sequence of 4 operations down to 3
operations we can reset INSN to the next instruction and continue the loop.
That skips the code to invalidate objects based on things like REG_INC
nodes, stack pushes and most importantly clobbers attached to the current
insn.
This patch factors all of the invalidation code used by reload_cse_move2add
into a new function and calls it at the appropriate time.
Georg-Johann has confirmed this patch fixes his avr bug and I've had it in
my tester over the weekend. It's bootstrapped and regression tested on
aarch64, m68k, sh4, alpha and hppa. It's also regression tested successfully
on a wide variety of other targets.
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
Martin Jambor [Wed, 23 Jul 2025 09:22:33 +0000 (11:22 +0200)]
tree-sra: Avoid total SRA if there are incompat. aggregate accesses (PR119085)
We currently use the types encountered in the function body and not in
type declaration to perform total scalarization. Bug PR 119085
uncovered that we miss a check that when the same data is accessed
with aggregate types that those are actually compatible. Without it,
we can base total scalarization on a type that does not "cover" all
live data in a different part of the function. This patch adds the
check.
gcc/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* tree-sra.cc (sort_and_splice_var_accesses): Prevent total
scalarization if two incompatible aggregates access the same place.
gcc/testsuite/ChangeLog:
2025-07-21 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/119085
* gcc.dg/tree-ssa/pr119085.c: New test.
Martin Jambor [Thu, 17 Jul 2025 16:28:33 +0000 (18:28 +0200)]
tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)
Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now. We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.
In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data. Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.
This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.
gcc/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.
gcc/testsuite/ChangeLog:
2025-07-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.
Jeff Law [Fri, 12 Sep 2025 22:08:38 +0000 (16:08 -0600)]
Fix latent LRA bug
Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug
in LRA.
We lazily allocate/reallocate the ira_reg_equiv structure and when we do
(re)allocation we'll over-allocate and zero-fill so that we don't have to
actually allocate and relocate the data so often.
In the case exposed by Shreya's work we had N requested entries at the last
rellocation step. We actually allocate N+M entries. During LRA we allocate
enough new pseudos and thus have N+M+1 pseudos.
In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read
past the allocated part of the array and get back junk which we use and
depending on the precise contents we fault in various fun and interesting ways.
We could either arrange to re-allocate ira_reg_equiv again on some path through
LRA (possibly in get_equiv itself). We could also just insert the bounds check
in get_equiv like is done elsewhere in LRA. Vlad indicated no strong
preference in an email last week.
So this just adds the bounds check in a manner similar to what's done elsewhere
in LRA. Bootstrapped and regression tested on x86_64 as well as RISC-V with
Shreya's work enabled and regtested across the various embedded targets.
gcc/
* lra-constraints.cc (get_equiv): Bounds check before accessing
data in ira_reg_equiv.
There are at least two cases where tree-switch-conversion leads
to unpleasant resource allocation:
PR49857
The lookup table lives in RAM. This is the case for all
devices that locate .rodata in RAM, which is for almost
all AVR devices.
PR81540
Code is bloated for 64-bit inputs.
As far as PR49857 is concerned, a target hook that may add an
address-space qualifier to the lookup table is the obvious
solution, though a respective patch has always been rejected by
global maintainers for non-technical reasons.
This also covers bad_function_call::what from C++11.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2011.xml: Add entry for bad_function_call.
* doc/xml/manual/status_cxx2017.xml: Add entries for bad_any_cast
and nullptr_t output. Update entry for sf.cmath. Fix stable name for
mem.res.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 39d7c4d42a764a86644198a517f58a94f467cdbd)
Tomasz Kamiński [Fri, 5 Sep 2025 11:16:40 +0000 (13:16 +0200)]
libstdc++: Document missing implementation defined behavior for std::filesystem.
libstdc++-v3/ChangeLog:
* doc/html/manual/status.html: Regenerate the file.
* doc/xml/manual/status_cxx2017.xml: Addd more entires.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit d6c370b8e96d43448537276d91c2b33fedb9754a)
Jonathan Wakely [Tue, 2 Sep 2025 21:30:46 +0000 (22:30 +0100)]
libstdc++: Make CTAD ignore pair(const T1&, const T2&) constructor [PR110853]
For the pair(T1, T2) explicit deduction type to decay its arguments as
intended, we need the pair(const T1&, const T2&) constructor to not be
used for CTAD. Otherwise we try to instantiate pair<T1, T2> without
decaying, which is ill-formed for function lvalues.
Use std::type_identity_t<T1> to make the constructor unusable for an
implicit deduction guide.
libstdc++-v3/ChangeLog:
PR libstdc++/110853
* include/bits/stl_pair.h [C++20] (pair(const T1&, const T2&)):
Use std::type_identity_t<T1> for first parameter.
* testsuite/20_util/pair/cons/110853.cc: New test.
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 0bb0d1d2880d562298eeec8eee4ab4e8ba943260)
Jonathan Wakely [Mon, 1 Sep 2025 17:12:27 +0000 (18:12 +0100)]
libstdc++: Fix std::get<T> for std::pair with reference members [PR121745]
Make the std::get<T> overloads for rvalues use std::forward<T>(p.first)
not std::move(p.first), so that lvalue reference members are not
incorrectly converted to rvalues.
It might appear that std::move(p).first would also work, but the
language rules say that for std::pair<T&&, U> that would produce T&
rather than the expected T&& (see the discussion in P2445R1 §8.2).
Additional tests are added to verify all combinations of reference
members, value categories, and const-qualification.
libstdc++-v3/ChangeLog:
PR libstdc++/121745
* include/bits/stl_pair.h (get): Use forward instead of move in
std::get<T> overloads for rvalue pairs.
* testsuite/20_util/pair/astuple/get_by_type.cc: Check all value
categories and cv-qualification.