git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Explicitly pass -Wsystem-headers in tests that need it

When running libstdc++ tests using an installed gcc (as opposed to an
in-tree gcc), we naturally use system stdlib headers instead of the
in-tree headers. But warnings from within system headers are suppressed
by default, so tests that check for such warnings spuriously fail in such
a setup. This patch makes us compile such tests with -Wsystem-headers so
that they consistently pass.

libstdc++-v3/ChangeLog:

* testsuite/20_util/bind/dangling_ref.cc: Compile with
-Wsystem-headers.
* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Likewise.
* testsuite/29_atomics/atomic/operators/pointer_partial_void.cc:
Likewise.
* testsuite/30_threads/packaged_task/cons/dangling_ref.cc:
Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit e690b97761e18daccb4fff0151c97c1d0115b55f)

Daily bump.

Fortran: Use associated TBP subroutine not found [PR89092]

2025-08-13 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/89092
* resolve.cc (was_declared): Add subroutine attribute.

gcc/testsuite/
PR fortran/89092
* gfortran.dg/pr89092.f90: New test.

(cherry picked from commit e6f4543f63366433493b3870845b555fd00be7e6)

Daily bump.

[MicroBlaze][PR target/118280] Fix __atomic_test_and_set

Atomic support enhanced to fix existing atomic_compare_and_swapsi pattern
to handle side effects; new patterns atomic_fetch_op and atomic_test_and_set
added. As MicroBlaze has no QImode test/set instruction, use shift magic
to implement atomic_test_and_set.

PR target/118280
gcc/
* config/microblaze/iterators.md: New.
* config/microblaze/microblaze-protos.h: Add
microblaze_subword_address.
* config/microblaze/microblaze.cc: Ditto.
* config/microblaze/microblaze.md: constants: Add UNSPECV_CAS_BOOL,
UNSPECV_CAS_MEM, UNSPECV_CAS_VAL, UNSPECV_ATOMIC_FETCH_OP
type: add atomic
* config/microblaze/sync.md: Add atomic_fetch_<atomic_optab>si
atomic_test_and_set

Signed-off-by: Kirk Meyer <kirk.meyer@sencore.com>
Signed-off-by: David Holsgrove <david.holsgrove@xilinx.com>
Signed-off-by: Gopi Kumar Bulusu <gopi@sankhya.com>
Signed-off-by: Michael Eager <eager@eagercon.com>

Daily bump.

Remove SPR/GNR/DMR from avx512_{move,store}_by pieces tune.

Align move_max with prefer_vector_width for SPR/GNR/DMR similar as
below commit.

commit 6ea25c041964bf63014fcf7bb68fb1f5a0a4e123
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Aug 15 12:54:07 2024 +0800

    Align ix86_{move_max,store_max} with vectorizer.

    When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
    avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
    set ix86_{move_max,store_max} as max available vector length except
    for AVX part.

                  if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
                      && TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
                    opts->x_ix86_move_max = PVW_AVX512;
                  else
                    opts->x_ix86_move_max = PVW_AVX128;

    So for -mavx2, vectorizer will choose 256-bit for vectorization, but
    128-bit is used for struct copy, there could be a potential STLF issue
    due to this "misalign".

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES):
Remove SPR/GNR/DMR.
(X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pieces-memcpy-18.c: Use -mtune=znver5
instead of -mtune=sapphirerapids.
* gcc.target/i386/pieces-memcpy-21.c: Ditto.
* gcc.target/i386/pieces-memset-46.c: Ditto.
* gcc.target/i386/pieces-memset-49.c: Ditto.

(cherry picked from commit dd713d0f3fc88778a9b3d4f8f1895a3cd6c145ca)

Daily bump.

testsuite: arm: Simplify fp16-aapcs tests

Reduce fp16-aapcs testcases to return value testing since parameter
passing are already tested in aapcs/vfp*.c

gcc/testsuite/ChangeLog:
* gcc.target/arm/fp16-aapcs.c: New test.
* gcc.target/arm/fp16-aapcs-1.c: Removed.
* gcc.target/arm/fp16-aapcs-2.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 1cf8cb45d872a5f09d65c63c891c091710c37432)

Daily bump.

Fix latent LRA bug

Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug
in LRA.

We lazily allocate/reallocate the ira_reg_equiv structure and when we do
(re)allocation we'll over-allocate and zero-fill so that we don't have to
actually allocate and relocate the data so often.

In the case exposed by Shreya's work we had N requested entries at the last
rellocation step.  We actually allocate N+M entries.  During LRA we allocate
enough new pseudos and thus have N+M+1 pseudos.

In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read
past the allocated part of the array and get back junk which we use and
depending on the precise contents we fault in various fun and interesting ways.

We could either arrange to re-allocate ira_reg_equiv again on some path through
LRA (possibly in get_equiv itself).  We could also just insert the bounds check
in get_equiv like is done elsewhere in LRA.  Vlad indicated no strong
preference in an email last week.

So this just adds the bounds check in a manner similar to what's done elsewhere
in LRA.  Bootstrapped and regression tested on x86_64 as well as RISC-V with
Shreya's work enabled and regtested across the various embedded targets.

gcc/
* lra-constraints.cc (get_equiv): Bounds check before accessing
data in ira_reg_equiv.

(cherry picked from commit 0c6ad3f5dfbd45150eeef2474899ba7ef0d8e592)

Daily bump.

libstdc++: Fix memory leak in PSTL TBB backend [PR117276]

Backport of upstream patch:
https://github.com/uxlfoundation/oneDPL/pull/1589

libstdc++-v3/ChangeLog:

PR libstdc++/117276
* include/pstl/parallel_backend_tbb.h (__func_task::finalize):
Make deallocation unconditional.

(cherry picked from commit d8f1655a781a76f5c86b3545b181b2005e585d29)

libstdc++: Remove blank line from bits/unique_ptr.h

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h: Remove blank line.

(cherry picked from commit 15327920854653887e8715bb1592121cafec5c3b)

aarch64: PR target/121749: Use dg-assemble in testcase

Committing as obvious.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/testsuite/

PR target/121749
* gcc.target/aarch64/simd/pr121749.c: Use dg-assemble directive.

(cherry picked from commit 2b8256d0ce18ed4d00868c78f5128d32884ccfa1)

aarch64: PR target/121749: Use correct predicate for narrowing shift amounts

With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narrowing shift instructions
are now represented with standard RTL and more merging optimisations occur.
This exposed a wrong predicate for the shift amount operand.
The shift amount is the number of bits of the narrow destination, not the input
sources.
Correct this by using the vn_mode attribute when specifying the predicate, which
exists for this purpose.

I've spotted a few more narrowing shift patterns that need the restriction, so
they are updated as well.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

PR target/121749
* config/aarch64/aarch64-simd.md (aarch64_<shrn_op>shrn_n<mode>):
Use aarch64_simd_shift_imm_offset_<vn_mode> instead of
aarch64_simd_shift_imm_offset_<ve_mode> predicate.
(aarch64_<shrn_op>shrn_n<mode> VQN define_expand): Likewise.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn_n<mode>): Likewise.
(aarch64_<shrn_op>rshrn_n<mode> VQN define_expand): Likewise.
(aarch64_sqshrun_n<mode>_insn): Likewise.
(aarch64_sqshrun_n<mode>): Likewise.
(aarch64_sqshrun_n<mode> VQN define_expand): Likewise.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
* config/aarch64/iterators.md (vn_mode): Handle DI, SI, HI modes.

gcc/testsuite/

PR target/121749
* gcc.target/aarch64/simd/pr121749.c: New test.

(cherry picked from commit cb508e54140687a50790059fac548d87515df6be)

AVR: Support AVR32EB14/20/28/32.

Add support for some recent AVR devices.

gcc/
* config/avr/avr-mcus.def: Add avr32eb14, avr32eb20,
avr32eb28, avr32eb32.
* doc/avr-mmcu.texi: Rebuild.

(cherry picked from commit 45f605a74fd7e96294477db064cc58033c3fba49)

c++: Fix mangling of _Float16 template args [PR121801]

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
gcc/testsuite/ChangeLog:

PR c++/121801
* g++.dg/abi/pr121801.C: New test.

gcc/cp/ChangeLog:

PR c++/121801
* mangle.cc (write_real_cst): Handle 16-bit real and assert
that reals have 16 bits or a multiple of 32 bits.

(cherry picked from commit 19d1c7c28f4fd0557dd868a7a4041b00ceada890)

s390: Emulate vec_cmp{eq,gt,gtu} for 128-bit integers

Mode iterator V_HW enables V1TI for target VXE which means
vec_cmpv1tiv1ti becomes available which leads to an ICE since there is
no corresponding insn.

Fixed by emulating comparisons and enabling mode V1TI unconditionally
for V_HW. For the sake of symmetry, I also added TI mode to V_HW since
TF mode is already included. As a consequence the consumers of V_HW
vec_{splat,slb,sld,sldw,sldb,srdb,srab,srb,test_mask_int,test_mask}
also become available for 128-bit integers.

This fixes gcc.c-torture/execute/pr105613.c and gcc.dg/pr106063.c.

gcc/ChangeLog:

* config/s390/vector.md (V_HW): Enable V1TI unconditionally and
add TI.
(vec_cmpu<VIT_HW:mode><VIT_HW:mode>): Add 128-bit integer
variants.
(*vec_cmpeq<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgt<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgtu<mode><mode>_nocc_emu): Emulate operation.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-cmp-emu-1.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-2.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-3.c: New test.

(cherry picked from commit 1b575bb24a7a3d2b00197dd5deb4c26b313f442b)

Daily bump.

AVR: Disable tree-switch-conversion per default.

There are at least two cases where tree-switch-conversion leads
to unpleasant resource allocation:

PR49857
    The lookup table lives in RAM.  This is the case for all
    devices that locate .rodata in RAM, which is for almost
    all AVR devices.

PR81540
    Code is bloated for 64-bit inputs.

As far as PR49857 is concerned, a target hook that may add an
address-space qualifier to the lookup table is the obvious
solution, though a respective patch has always been rejected by
global maintainers for non-technical reasons.

gcc/
PR target/81540
PR target/49857
* common/config/avr/avr-common.cc: Disable -ftree-switch-conversion.

(cherry picked from commit 912159d2b5429c3126756b56723dd4f32dd56bdd)

Daily bump.

libstdc++: Document remaining C++17 implementation-defined behavior.

This also covers bad_function_call::what from C++11.

libstdc++-v3/ChangeLog:

* doc/html/manual/status.html: Regenerate.
* doc/xml/manual/status_cxx2011.xml: Add entry for bad_function_call.
* doc/xml/manual/status_cxx2017.xml: Add entries for bad_any_cast
and nullptr_t output. Update entry for sf.cmath. Fix stable name for
mem.res.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 39d7c4d42a764a86644198a517f58a94f467cdbd)

libstdc++: Document missing implementation defined behavior for std::filesystem.

libstdc++-v3/ChangeLog:

* doc/html/manual/status.html: Regenerate the file.
* doc/xml/manual/status_cxx2017.xml: Addd more entires.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit d6c370b8e96d43448537276d91c2b33fedb9754a)

Daily bump.

libstdc++: Fix docs for --enable-vtable-verify [PR120698]

libstdc++-v3/ChangeLog:

PR libstdc++/120698
* doc/xml/manual/configure.xml: Do not claim that vtv is enabled
by default.
* doc/html/manual/configure.html: Regenerate.

(cherry picked from commit d199a9c7c5034d0eddb3380a58342a5bcbe6febd)

Daily bump.

libphobos: enable for more hppa tuples

Gentoo uses hppa1.1*-*-linux* and hppa2.0*-*-linux* instead of Debian's
hppa-*-linux*.

libphobos/ChangeLog:

* configure.tgt: Add hppa[12]*-*-linux* as a supported target.

(cherry picked from commit 35cf8d85841a6301eeb12668085e326ddd115f6e)

libphobos: enable for sparc64-unknown-linux-gnu

This bootstraps with some test failures but works well enough to build
11..15.

libphobos/ChangeLog:

* configure.tgt: Add sparc64-unknown-linux-gnu as a supported target.

(cherry picked from commit 2572d46f0d1e426c1091f9b84861ee5213b84b5a)

libphobos: enable for powerpc64le-linux-gnu

libphobos/ChangeLog:

* configure.tgt: Add powerpc64le--linux-gnu as a supported target
when configured with --with-long-double-format=ieee.

Daily bump.

libstdc++: Make CTAD ignore pair(const T1&, const T2&) constructor [PR110853]

For the pair(T1, T2) explicit deduction type to decay its arguments as
intended, we need the pair(const T1&, const T2&) constructor to not be
used for CTAD. Otherwise we try to instantiate pair<T1, T2> without
decaying, which is ill-formed for function lvalues.

Use std::type_identity_t<T1> to make the constructor unusable for an
implicit deduction guide.

libstdc++-v3/ChangeLog:

PR libstdc++/110853
* include/bits/stl_pair.h [C++20] (pair(const T1&, const T2&)):
Use std::type_identity_t<T1> for first parameter.
* testsuite/20_util/pair/cons/110853.cc: New test.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 0bb0d1d2880d562298eeec8eee4ab4e8ba943260)

libstdc++: Fix std::get<T> for std::pair with reference members [PR121745]

Make the std::get<T> overloads for rvalues use std::forward<T>(p.first)
not std::move(p.first), so that lvalue reference members are not
incorrectly converted to rvalues.

It might appear that std::move(p).first would also work, but the
language rules say that for std::pair<T&&, U> that would produce T&
rather than the expected T&& (see the discussion in P2445R1 §8.2).

Additional tests are added to verify all combinations of reference
members, value categories, and const-qualification.

libstdc++-v3/ChangeLog:

PR libstdc++/121745
* include/bits/stl_pair.h (get): Use forward instead of move in
std::get<T> overloads for rvalue pairs.
* testsuite/20_util/pair/astuple/get_by_type.cc: Check all value
categories and cv-qualification.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit c8a24f60b6874fca4fb3adb153f8d5f1dd72b951)

libstdc++: Implement LWG4222 'expected' constructor from a single value missing a constraint

libstdc++-v3/ChangeLog:

* include/std/expected (expected(U&&)): Add missing constraint
as per LWG 4222.
* testsuite/20_util/expected/lwg4222.cc: New test.

Signed-off-by: Yihan Wang <yronglin777@gmail.com>
(cherry picked from commit 589f3cd1831446485a6c602578177f5d9794d936)

libstdc++: Implement LWG 3836 for std::expected bool conversions

libstdc++-v3/ChangeLog:

* include/std/expected (expected): Constrain constructors to
prevent problematic bool conversions, as per LWG 3836.
* testsuite/20_util/expected/lwg3836.cc: New test.

(cherry picked from commit dca6a9a940e46d0c2d115a4702d648529a42efa9)

Daily bump.

libstdc++: Add missing <vector> header to unordered_set/pr115285.cc test

libstdc++-v3/ChangeLog:

* testsuite/23_containers/unordered_set/pr115285.cc: Include
missing header for std::vector.

(cherry picked from commit a51d220377ab8117305567e888a942d127ef6a48)

libstdc++: Fix -Wswitch warning in <regex>

This fixes a warning seen with -Wsystem-headers:

include/c++/14.3.0/bits/regex_compiler.h:191:11: warning: case value '0' not in enumerated type 'std::regex_constants::syntax_option_type' [-Wswitch]
191 | case _FlagT(0):
| ^~~~

There's no diagnostic on trunk since the flag_enum attribute was added
to the enum type in r15-3500-g1914ca8791ce4e.

libstdc++-v3/ChangeLog:

* include/bits/regex_compiler.h (_Compiler::_S_validate): Add
diagnostic pragma to disable -Wswitch warning.

middle-end: Fix typo in gimple.h

gcc/ChangeLog:

* gimple.h (GTMA_DOES_GO_IRREVOCABLE): Fix typo.

(cherry picked from commit 356250630abd876ae592bc3d2b4cc171bc834b79)

libstdc++: Check _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK with #if [PR121496]

The change in r14-905-g3b7cb33033fbe6 to disable the use of
pthread_mutex_clocklock when TSan is active assumed that the
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK macro was always checked with #if
rather than #ifdef, which was not true.

This makes the checks use #if consistently.

libstdc++-v3/ChangeLog:

PR libstdc++/121496
* include/std/mutex (__timed_mutex_impl::_M_try_wait_until):
Change preprocessor condition to use #if instead of #ifdef.
(recursive_timed_mutex::_M_clocklock): Likewise.
* testsuite/30_threads/timed_mutex/121496.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit d1dec304453fa4874d16daaa15e6f477435edda4)

libstdc++: Fix std::numeric_limits<__float128>::max_digits10 [PR121374]

When I added this explicit specialization in r14-1433-gf150a084e25eaa I
used the wrong value for the number of mantissa digits (I used 112
instead of 113). Then when I refactored it in r14-1582-g6261d10521f9fd I
used the value calculated from the incorrect value (35 instead of 36).

libstdc++-v3/ChangeLog:

PR libstdc++/121374
* include/std/limits (numeric_limits<__float128>::max_digits10):
Fix value.
* testsuite/18_support/numeric_limits/128bit.cc: Check value.

(cherry picked from commit cf88ed5bf20c86ca38da19358ff79a34adb4d0b5)

libstdc++: Use __promote_3 for std::hypot [PR121097]

The __promoted_t alias is only defined when __cpp_fold_expressions is
defined, which might not be the case for some hypothetical C++17
compilers.

Change the 3-arg std::hypot to just use __gnu_cxx::__promote_3 which is
always available.

libstdc++-v3/ChangeLog:

PR libstdc++/121097
* include/c_global/cmath (hypot): Use __promote_3 instead of
__promoted.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit f4932c59df387a505de69a5a1015a03caa4ccf08)

Daily bump.

Revert "Fix _Decimal128 arithmetic error under FE_UPWARD."

This reverts commit e645728e9de64d019661c8f92bb487e06d95644a.

Daily bump.

Fix _Decimal128 arithmetic error under FE_UPWARD.

libgcc/config/libbid/ChangeLog:

PR target/120691
* bid128_div.c: Fix _Decimal128 arithmetic error under
FE_UPWARD.
* bid128_rem.c: Ditto.
* bid128_sqrt.c: Ditto.
* bid64_div.c (bid64_div): Ditto.
* bid64_sqrt.c (bid64_sqrt): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr120691.c: New test.

(cherry picked from commit 50064b2898edfb83bc37f2597a35cbd3c1c853e3)

Daily bump.

Fortran: fix bogus runtime error with optional procedure argument [PR121145]

PR fortran/121145

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Do not create pointer
check for proc-pointer actual passed to optional dummy.

gcc/testsuite/ChangeLog:

* gfortran.dg/pointer_check_15.f90: New test.

(cherry picked from commit 8f9450505f8244d262f8b4ff274f113f99cdc7e2)

Fortran: follow-up fix to checking of renamed-on-use interface name [PR120784]

Commit r16-1633 introduced a regression for imported interfaces that were
not renamed-on-use, since the related logic did not take into account that
the absence of renaming could be represented by an empty string.

PR fortran/120784

gcc/fortran/ChangeLog:

* interface.cc (gfc_match_end_interface): Detect empty local_name.

gcc/testsuite/ChangeLog:

* gfortran.dg/interface_63.f90: Extend testcase.

(cherry picked from commit ddff83b3dde4a8308d0e156f85693e7176b85524)

Fortran: fix checking of renamed-on-use interface name [PR120784]

PR fortran/120784

gcc/fortran/ChangeLog:

* interface.cc (gfc_match_end_interface): If a use-associated
symbol is renamed, use the local_name for checking.

gcc/testsuite/ChangeLog:

* gfortran.dg/interface_63.f90: New test.

(cherry picked from commit 6dd1659cf10a7ad51576f902ef3bc007db30c990)

Daily bump.

tree-sra: Avoid total SRA if there are incompat. aggregate accesses  (PR119085)

We currently use the types encountered in the function body and not in
type declaration to perform total scalarization.  Bug PR 119085
uncovered that we miss a check that when the same data is accessed
with aggregate types that those are actually compatible.  Without it,
we can base total scalarization on a type that does not "cover" all
live data in a different part of the function.  This patch adds the
check.

gcc/ChangeLog:

2025-07-21  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/119085
* tree-sra.cc (sort_and_splice_var_accesses): Prevent total
scalarization if two incompatible aggregates access the same place.

gcc/testsuite/ChangeLog:

2025-07-21  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/119085
* gcc.dg/tree-ssa/pr119085.c: New test.

(cherry picked from commit 171fcc80ede596442712e559c4fc787aa4636694)

tree-sra: Fix grp_covered flag computation when totally scalarizing (PR117423)

Testcase of PR 117423 shows a flaw in the fancy way we do "total
scalarization" in SRA now.  We use the types encountered in the
function body and not in type declaration (allowing us to totally
scalarize when only one union field is ever used, since we effectively
"skip" the union then) and can accommodate pre-existing accesses that
happen to fall into padding.

In this case, we skipped the union (bypassing the
totally_scalarizable_type_p check) and the access falling into the
"padding" is an aggregate and so not a candidate for SRA but actually
containing data.  Arguably total scalarization should just bail out
when it encounters this situation (but I decided not to depend on this
mainly because we'd need to detect all cases when we eventually cannot
scalarize, such as when a scalar access has children accesses) but the
actual bug is that the detection if all data in an aggregate is indeed
covered by replacements just assumes that is always the case if total
scalarization triggers which however may not be the case in cases like
this - and perhaps more.

This patch fixes the bug by just assuming that all padding is taken
care of when total scalarization triggered, not that every access was
actually scalarized.

gcc/ChangeLog:

2025-07-17  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/117423
* tree-sra.cc (analyze_access_subtree): Fix computation of grp_covered
flag.

gcc/testsuite/ChangeLog:

2025-07-17  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/117423
* gcc.dg/tree-ssa/pr117423.c: New test.

(cherry picked from commit 7375909e9d9e7de23acb4b1e0a965d8faf1943c4)

AVR: target/121608 - Don't add --relax when linking with -r.

The linker rejects --relax in relocatable links (-r), hence only
add --relax when -r is not specified.

gcc/
PR target/121608
* config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.

(cherry picked from commit 0f15ff7b511493e9197e6153b794081c1557ba02)

Daily bump.

aarch64: Fix mode mismatch when building a predicate [PR121118]

This PR is about a case where we used aarch64_expand_sve_const_pred_trn
to combine two predicates, one of which was constructing using
aarch64_sve_move_pred_via_while.  The former requires the inputs
to have mode VNx16BI, but the latter returned VNx8BI for a .H
WHILELO.

The proper fix, used on trunk, is to make the pattern emitted by
aarch64_sve_move_pred_via_while produce an VNx16BI for all element
sizes, since every bit of the result is significant.  However,
that required some target-independent changes that are too invasive
to backport.  This patch goes for the simpler (but less robust) approach
of using the original pattern and casting it to VNx16BI after the fact.

Since the WHILELO pattern is an unspec, the chances of something
optimising it in a way that changes the undefined bits of the output
should be very low, especially on a release branch.  It is still a less
satisfactory fix though.

gcc/
PR target/121118
* config/aarch64/aarch64.cc (aarch64_sve_move_pred_via_while):
Return a VNx16BI predicate.

gcc/testsuite/
PR target/121118
* gcc.target/aarch64/sve/acle/general/pr121118_1.c: New test.

(cherry picked from commit 58a9717df098defb7f595fbc56122107e952a46b)

fwprop: Don't propagate asms [PR121253]

For the reasons explained in the comment, fwprop shouldn't even
try to propagate an asm definition.

gcc/
PR rtl-optimization/121253
* fwprop.cc (forward_propagate_into): Don't propagate asm defs.

gcc/testsuite/
PR rtl-optimization/121253
* gcc.target/aarch64/pr121253.c: New test.

(cherry picked from commit e82c8413eda498163ae2e0ecc458ea0428708c30)

Daily bump.

AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

This patch fixes incorrect constraints in RTL patterns for AArch64 SVE
gather/scatter with type widening/narrowing and vector-plus-immediate
addressing. The bug leads to below "immediate offset out of range"
errors during assembly, eventually causing compilation failures.

/tmp/ccsVqBp1.s: Assembler messages:
/tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]'

Current RTL patterns for such instructions incorrectly use vgw or vgd
constraints for the immediate operand, base on the vector element type
in Z registers (zN.s or zN.d). However, for gather/scatter with type
conversions, the immediate range for vector-plus-immediate addressing is
determined by the element type in memory, which differs from that in
vector registers. Using the wrong constraint can produce out-of-range
offset values that cannot be encoded in the instruction.

This patch corrects the constraints used in these patterns. A test case
that reproduces the issue is also included.

Bootstrapped and regression-tested on aarch64-linux-gnu.

gcc/ChangeLog:
PR target/121449
* config/aarch64/aarch64-sve.md
(mask_gather_load<mode><v_int_container>): Use vg<Vesize>
constraints for alternatives with immediate offset.
(mask_scatter_store<mode><v_int_container>): Likewise.

gcc/testsuite/ChangeLog:
PR target/121449
* g++.target/aarch64/sve/pr121449.C: New test.

aarch64: Adapt unwinder to linux's SME signal behaviour

SME uses a lazy save system to manage ZA.  The idea is that,
if a function with ZA state wants to call a "normal" function,
it can leave its state in ZA and instead set up a lazy save buffer.
If, unexpectedly, that normal function contains a nested use of ZA,
that nested use of ZA must commit the lazy save first.

This lazy save system uses a special system register called TPIDR2_EL0.
See:

  https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#66the-za-lazy-saving-scheme

for details.

The ABI specifies that, on entry to an exception handler, the following
things must be true:

* PSTATE.SM must be 0 (the processor must be in non-streaming mode)

* PSTATE.ZA must be 0 (ZA must be off)

* TPIDR2_EL0 must be 0 (there must be no uncommitted lazy save)

This is normally done by making _Unwind_RaiseException & friends
commit any lazy save before they unwind.  This also has the side
effect of ensuring that TPIDR2_EL0 is never left pointing to a
lazy save buffer that has been unwound.

However, things get more complicated with signals.  If:

(a) a signal is raised while ZA is dormant (that is, while there is an
    uncommitted lazy save);

(b) the signal handler throws an exception; and

(c) that exception is caught outside the signal handler

something must ensure that the lazy save from (a) is committed.

This would be simple if the signal handler was entered with ZA and
TPIDR2_EL0 intact.  However, for various good reasons that are out
of scope here, this is not done.  Instead, Linux now clears both
TPIDR2_EL0 and PSTATE.ZA before entering a signal handler, see:

  https://lore.kernel.org/all/20250417190113.3778111-1-mark.rutland@arm.com/

for details.

Therefore, it is the unwinder that must simulate a commit of the lazy
save from (a).  It can do this by reading the previous values of
TPIDR2_EL0 and ZA from the sigcontext.

The SME-related sigcontext structures were only added to linux's
asm/sigcontext.h relatively recently and we can't rely on GCC being
built against such recent kernel header files.  The patch therefore uses
defines relevant macros if they are not defined and provide types that
comply with ABI layout of the corresponding linux types.

The patch includes some ugly casting in an attempt to support big-endian
ILP32, even though SME on big-endian ILP32 linux should never be a thing.
We can remove it if we also remove ILP32 support from GCC.

Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com>
Reviewed-by: Tamar Christina <tamar.christina@arm.com>
gcc/
* doc/sourcebuild.texi (aarch64_sme_hw): Document.

gcc/testsuite/
* lib/target-supports.exp (add_options_for_aarch64_sme)
(check_effective_target_aarch64_sme_hw): New procedures.
* g++.target/aarch64/sme/sme_throw_1.C: New test.
* g++.target/aarch64/sme/sme_throw_2.C: Likewise.

libgcc/
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
If a signal was raised while there was an uncommitted lazy save,
commit the save as part of the unwind process.

(cherry picked from commit b5ffc8e75a81bab7ee7554483447c27be438464e)

LoongArch: Fix ICE caused by function add_stmt_cost[PR121542].

PR target/121542

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_vector_costs::add_stmt_cost): When using vectype,
first determine whether it is NULL.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr121542.c: New test.

(cherry picked from commit d1c207a65d25b50e851ab53956271c91e0281ae0)

Daily bump.

aarch64: Mark SME functions as .variant_pcs [PR121414]

Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible
functions allow/require PSTATE.SM to be 1 on entry, so they need to
be treated as STO_AARCH64_VARIANT_PCS.

Similarly, functions that share ZA or ZT0 with their callers require
ZA to be active on entry, whereas the base PCS requires ZA to be
dormant or off. These functions too need to be marked as having
a variant PCS.

gcc/
PR target/121414
* config/aarch64/aarch64.cc (aarch64_is_variant_pcs): New function,
split out from...
(aarch64_asm_output_variant_pcs): ...here. Handle various types
of SME function type.

gcc/testsuite/
PR target/121414
* gcc.target/aarch64/sme/pr121414_1.c: New test.

(cherry picked from commit 868a5774431e889aa2c35bf7f678433cfa21e3d4)

aarch64: Use VNx16BI for svrev_b* [PR121294]

The previous patch for PR121294 handled svtrn1/2, svuzp1/2, and svzip1/2.
This one extends it to handle svrev intrinsics, where the same kind of
wrong code can be generated.

gcc/
PR target/121294
* config/aarch64/aarch64.md (UNSPEC_REV_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_rev<mode>_acle)
(*aarch64_sve_rev<mode>_acle): New patterns.
* config/aarch64/aarch64-sve-builtins-base.cc
(svrev_impl::expand): Use the new patterns for boolean svrev.

gcc/testsuite/
PR target/121294
* gcc.target/aarch64/sve/acle/general/rev_2.c: New test.

(cherry picked from commit 701193a7a6d3f3f345bf336e7ebc7d6fa1e0c5ac)

aarch64: Use VNx16BI for more permutations [PR121294]

The patterns for the predicate forms of svtrn1/2, svuzp1/2,
and svzip1/2 are shared with aarch64_vectorize_vec_perm_const.
The .H, .S, and .D forms operate on VNx8BI, VNx4BI, and VNx2BI
respectively.  Thus, for all four element widths, there is one
significant bit per element, for both the inputs and the output.

That's appropriate for aarch64_vectorize_vec_perm_const but not
for the ACLE intrinsics, where every bit of the output is
significant, and where every bit of the selected input elements
is therefore also significant.  The current expansion can lead
the optimisers to simplify inputs by changing the upper bits
of the input elements (since the current patterns claim that
those bits don't matter), which in turn leads to wrong code.

The ACLE expansion should operate on VNx16BI instead, for all
element widths.

There was already a pattern for a VNx16BI-only form of TRN1, for
constructing certain predicate constants.  The patch generalises it to
handle the other five permutations as well.  For the reasons given in
the comments, this is done by making the permutation unspec an operand
to a new UNSPEC_PERMUTE_PRED, rather than overloading the existing
unspecs, and rather than adding a new unspec for each permutation.

gcc/
PR target/121294
* config/aarch64/iterators.md (UNSPEC_TRN1_CONV): Delete.
(UNSPEC_PERMUTE_PRED): New unspec.
* config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>):
Replace with...
(@aarch64_sve_<perm_insn><mode>_acle)
(*aarch64_sve_<perm_insn><mode>_acle): ...these new patterns.
* config/aarch64/aarch64.cc (aarch64_expand_sve_const_pred_trn):
Update accordingly.
* config/aarch64/aarch64-sve-builtins-functions.h
(binary_permute::expand): Use the new _acle patterns for
predicate operations.

gcc/testsuite/
PR target/121294
* gcc.target/aarch64/sve/acle/general/perm_2.c: New test.
* gcc.target/aarch64/sve/acle/general/perm_3.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_4.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_5.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_6.c: Likewise.
* gcc.target/aarch64/sve/acle/general/perm_7.c: Likewise.

(cherry picked from commit 4cf9d4ebdd68a724eb41044cd8f2a4d466d81c7f)

x86: Pass -mno-80387 to compile pr121208-1(a|b).c

Pass -mno-80387 to compile pr121208-1(a|b).c to silence

.../pr121208-1a.c:11:1: sorry, unimplemented: 80387 instructions aren’t allowed in a function with the ‘no_caller_saved_registers’ attribute

Partially backport the PR target/121540 fix

9d7f45e9806 x86: Disallow MMX and 80387 in no_caller_saved_registers function

to also add -mno-sse -mno-mmx.

PR target/121208
* gcc.target/i386/pr121208-1a.c (dg-options): Add
-mno-sse -mno-mmx -mno-80387.
* gcc.target/i386/pr121208-1b.c (dg-options): Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit c6d1f58da7eb72e8bac307d342e4655012b36a89)

x86: Disallow -mtls-dialect=gnu with no_caller_saved_registers

__tls_get_addr doesn't preserve vector registers. When a function
with no_caller_saved_registers attribute calls __tls_get_addr, YMM
and ZMM registers will be clobbered. Issue an error and suggest
-mtls-dialect=gnu2 in this case.

gcc/

PR target/121208
* config/i386/i386.cc (ix86_tls_get_addr): Issue an error for
-mtls-dialect=gnu with no_caller_saved_registers attribute and
suggest -mtls-dialect=gnu2.

gcc/testsuite/

PR target/121208
* gcc.target/i386/pr121208-1a.c: New test.
* gcc.target/i386/pr121208-1b.c: Likewise.
* gcc.target/i386/pr121208-2a.c: Likewise.
* gcc.target/i386/pr121208-2b.c: Likewise.
* gcc.target/i386/pr121208-3a.c: Likewise.
* gcc.target/i386/pr121208-3b.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 5760ddbce26ff9c5c8851b6b2089ad65981d5078)

Daily bump.

LoongArch: Define hook TARGET_COMPUTE_PRESSURE_CLASSES[PR120476].

The rtx cost value defined by the target backend affects the
calculation of register pressure classes in the IRA, thus affecting
scheduling. This may cause program performance degradation.
For example, OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r.

This problem can be avoided by defining a set of register pressure
classes in the target backend instead of using the default IRA to
automatically calculate them.

gcc/ChangeLog:

PR target/120476
* config/loongarch/loongarch.cc
(loongarch_compute_pressure_classes): New function.
(TARGET_COMPUTE_PRESSURE_CLASSES): Define.

(cherry picked from commit d94178d9b3fb1cb869b90d6f061990eae75c770e)

Daily bump.

LoongArch: macro instead enum for base abi type

enum can't be used in #if.
For #if expression, identifiers that are not macros,
which are all considered to be the number zero.

This patch may fix https://sourceware.org/bugzilla/show_bug.cgi?id=32776.

gcc/ChangeLog:

* config/loongarch/loongarch-def.h (ABI_BASE_LP64D): New macro.
(ABI_BASE_LP64F): New macro.
(ABI_BASE_LP64S): New macro.
(N_ABI_BASE_TYPES): New macro.

(cherry picked from commit 9467435253948b83fcb5f7430f6cd571236960d8)