git.ipfire.org Git - thirdparty/gcc.git/log

Make fur_edge accessible.

Move the decl of fur_edge out of the source file into the header file.

* gimple-range-fold.cc (class fur_edge): Relocate from here.
(fur_edge::fur_edge): Also move to:
* gimple-range-fold.h (class fur_edge): Relocate to here.
(fur_edge::fur_edge): Likewise.

libstdc++: Fix up 117406.cc test [PR117406]

Christophe mentioned in bugzilla that the test FAILs on aarch64,
I'm not including <climits> and use INT_MAX.
Apparently during my testing I got it because the test preinclude
-include bits/stdc++.h
and that includes <climits>, dunno why that didn't happen on aarch64.
In any case, either I can add #include <climits>, or because the
test already has #include <limits> I've changed uses of INT_MAX
with std::numeric_limits<int>::max(), that should be the same thing.
But if you prefer
#include <climits>
I can surely add that instead.

2024-11-04 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/117406
* testsuite/26_numerics/headers/cmath/117406.cc: Use
std::numeric_limits<int>::max() instead of INT_MAX.

(cherry picked from commit afcbf4dd27c147eb7d8f84e1a41c021eddec777e)

libstdc++: Fix up std::{,b}float16_t std::{ilogb,l{,l}r{ound,int}} [PR117406]

These overloads incorrectly cast the result of the float __builtin_*
to _Float or __gnu_cxx::__bfloat16_t.  For std::ilogb that changes
behavior for the INT_MAX return because that isn't representable in
either of the floating point formats, for the others it is I think
just a very inefficient hop from int/long/long long to std::{,b}float16_t
and back.  I mean for the round/rint cases, either the argument is small
and then the return value should be representable in the floating point
format too, or it is too large that the argument is already integral
and then it should just return the argument with the round trips.
Too large value is unspecified unlike ilogb.

2024-11-02  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/117406
* include/c_global/cmath (std::ilogb(_Float16), std::llrint(_Float16),
std::llround(_Float16), std::lrint(_Float16), std::lround(_Float16)):
Don't cast __builtin_* return to _Float16.
(std::ilogb(__gnu_cxx::__bfloat16_t),
std::llrint(__gnu_cxx::__bfloat16_t),
std::llround(__gnu_cxx::__bfloat16_t),
std::lrint(__gnu_cxx::__bfloat16_t),
std::lround(__gnu_cxx::__bfloat16_t)): Don't cast __builtin_* return to
__gnu_cxx::__bfloat16_t.
* testsuite/26_numerics/headers/cmath/117406.cc: New test.

(cherry picked from commit 36a9e2b22596711455e702ea5a5a3f26e145321c)

expand: Fix up expansion of VIEW_CONVERT_EXPR to BITINT_TYPE [PR117354]

The following testcase ICEs, because when trying to expand the
VIEW_CONVERT_EXPR operand which is SSA_NAME defined to
V32QI or V4DI MEM_REF which is aligned just to 8 bytes we force
it as unaligned into a register, but then try to call extract_bit_field
from the V32QI or V4DI register to BLKmode.  extract_bit_field doesn't
obviously support BLKmode extraction and so ICEs.

The second hunk fixes the ICE by not calling extract_bit_field when
it can't handle it, the last if will handle it properly by storing
it to memory and using BLKmode access to the copy.

The first hunk is an optimization, if mode is BLKmode, by setting
inner_reference_p argument to expand_expr_real we avoid the
expand_misaligned_mem_ref calls which load it from memory into a register.

2024-10-31  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117354
* expr.cc (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: Pass
true as inner_reference_p argument to expand_expr_real if
mode is BLKmode.  Don't call extract_bit_field if mode is BLKmode.

* gcc.dg/bitint-113.c: New test.

(cherry picked from commit b39f62ff739e9ffea0e6485667f15b985f8cd63d)

function: Call do_pending_stack_adjust in assign_parms [PR117296]

Functions called by assign_parms call emit_block_move in two places,
so on some targets can be expanded as calls and can result in pending
stack adjustment.

Now, during expansion we normally call do_pending_stack_adjust at the end
of expansion of each basic block or before emitting code that will branch
and/or has labels, and when emitting labels we assert that there are no
pending stack adjustments.

assign_parms is expanded before the first basic block and if the first
basic block starts with a label and at least one of those emit_block_move
calls resulted in the need of pending stack adjustments, we ICE when
emitting that label.

The following patch fixes that by calling do_pending_stack_adjust after
after the assign_parms potential emit_block_move calls.

2024-10-30 Jakub Jelinek <jakub@redhat.com>

PR target/117296
* function.cc (assign_parms): Call do_pending_stack_adjust.

* gcc.target/i386/pr117296.c: New test.

(cherry picked from commit fccef0c4ed0119ac53940bdb3838052339cf14a2)

libstdc++: Use if consteval rather than if (std::__is_constant_evaluated()) for {,b}float16_t nextafter [PR117321]

The nextafter_c++23.cc testcase fails to link at -O0.
The problem is that eventhough std::__is_constant_evaluated() has
always_inline attribute, that at -O0 just means that we inline the
call, but its result is still assigned to a temporary which is tested
later, nothing at -O0 propagates that false into the if and optimizes
away the if body.  And the __builtin_nextafterf16{,b} calls are meant
to be used solely for constant evaluation, the C libraries don't
define nextafterf16 these days.

As __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__ are predefined right
now only by GCC, not by clang which doesn't implement the extended floating
point types paper, and as they are predefined in C++23 and later modes only,
I think we can just use if consteval which is folded already during the FE
and the body isn't included even at -O0.  I've added a feature test for
that just in case clang implements those and implements those in some weird
way.  Note, if (__builtin_is_constant_evaluted()) would work correctly too,
that is also folded to false at gimplification time and the corresponding
if block not emitted at all.  But for -O0 it can't be wrapped into a helper
inline function.

2024-10-29  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/117321
* include/c_global/cmath (nextafter(_Float16, _Float16)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16 call.
(nextafter(__gnu_cxx::__bfloat16_t, __gnu_cxx::__bfloat16_t)): Use
if consteval rather than if (std::__is_constant_evaluated()) around
the __builtin_nextafterf16b call.
* testsuite/26_numerics/headers/cmath/117321.cc: New test.

(cherry picked from commit 5e247ac0c28b9a2662f99c4a5420c5f7c2d0c6bd)

Add regression test

This is for the latest fix made to Selected_Length_Checks in Checks.

gcc/testsuite
* gnat.dg/specs/array7.ads: New test.

ada: Fix internal error on concatenation of discriminant-dependent component

This only occurs with optimization enabled, but the expanded code is always
wrong because it reuses the formal parameter of an initialization procedure
associated with a discriminant (a discriminal in GNAT parlance) outside of
the initialization procedure.

gcc/ada/

* checks.adb (Selected_Length_Checks.Get_E_Length): For a
component of a record with discriminants and if the expression is
a selected component, try to build an actual subtype from its
prefix instead of from the discriminal.

Daily bump.

i386: Do not allow pointer conversion for CMPccXADD intrin under -O0

The pointer conversion to wider type under macro would not consider
whether the higher bit is cleaned or not. It will lead to unexpected
cmp result.

After this change, it will throw an incompatible pointer type error just
like -O2 does currently.

gcc/ChangeLog:

* config/i386/cmpccxaddintrin.h (_cmpccxadd_epi32): Do not do
type conversion for pointer.
(_cmpccxadd_epi64): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cmpccxadd-1b.c: New test.

Fortran: Fix ICE with structure constructor in data statement [PR79685]

2024-10-25 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/79685
* decl.cc (match_data_constant): Find the symtree instead of
the symbol so the use renamed symbols are found. Pass this and
the derived type to gfc_match_structure_constructor.
* match.h: Update prototype of gfc_match_structure_contructor.
* primary.cc (gfc_match_structure_constructor): Remove call to
gfc_get_ha_sym_tree and use caller supplied symtree instead.

gcc/testsuite/
PR fortran/79685
* gfortran.dg/use_rename_13.f90: New test.

(cherry picked from commit 6cb1da72cac166bd3b005c0430557b68b9761da5)

[APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

According to APX spec, the pushp/popp pairs should be matched,
otherwise the PPX hint cannot take effect and cause performance loss.

In the ix86_expand_epilogue, there are several optimizations that may
cause the epilogue using mov to restore the regs. Check if PPX applied
and prevent usage of mov/leave in the epilogue. Also do not use PPX
for eh_return.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_expand_prologue): Set apx_ppx_used
flag in m.fs with TARGET_APX_PPX && !crtl->calls_eh_return.
(ix86_emit_save_regs): Emit ppx is available only when
TARGET_APX_PPX && !crtl->calls_eh_return.
(ix86_expand_epilogue): Don't restore reg using mov when
apx_ppx_used flag is true.
* config/i386/i386.h (struct machine_frame_state):
Add apx_ppx_used flag.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ppx-2.c: New test.
* gcc.target/i386/apx-ppx-3.c: Likewise.

(cherry picked from commit 8e72b1bb3896f6e8d4f4679cbcfbc2a8212d04f9)

Daily bump.

rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

Only disable shrink-wrapping when using -mrop-protect when we know we
will be emitting the ROP-protect hash instructions (ie, non-leaf functions).

2024-06-17 Peter Bergner <bergner@linux.ibm.com>

gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
the disabling of shrink-wrapping from here....
* config/rs6000/rs6000-logue.cc (rs6000_emit_prologue): ...to here.

gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-1.c: New test.

(cherry picked from commit 0451bc503da9c858e9f1ddfb8faec367c2e032c8)

aarch64: Forbid F64MM permutes in streaming mode

The current code was based on an early version of the SME spec,
which allowed the .Q forms of TRN1, TRN2, UZP1, UZP2, ZIP1, and ZIP2
to be used in streaming mode. We should now forbid them instead;
see https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/TRN1--TRN2--vectors---Interleave-even-or-odd-elements-from-two-vectors-?lang=en
and the corresponding entries for the others.

gcc/
* config/aarch64/aarch64-sve-builtins-base.def (svtrn1q, svtrn2q)
(svuzp1q, svuzp2q, svzip1q, svzip2q): Require SM_OFF.

gcc/testsuite/
* g++.target/aarch64/sve/aarch64-ssve.exp: Add tests for trn[12]q,
uzp[12].c, and zip[12]q.
* gcc.target/aarch64/sve/acle/asm/trn1q_bf16.c: Skip for
STREAMING_COMPATIBLE.
* gcc.target/aarch64/sve/acle/asm/trn1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/trn2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/uzp2q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip1q_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/zip2q_u8.c: Likewise.

(cherry picked from commit e8fa15ea01439bccf15ae9b9a4d63ac92586c2c5)

Fix function multiversioning dispatcher link error with LTO

We forgot to apply DECL_EXTERNAL to __init_cpu_features_resolver decl. When
building with LTO, the linker cannot find the
__init_cpu_features_resolver.lto_priv* symbol, causing the link error.

This patch gets this fixed by adding DECL_EXTERNAL to the decl. To avoid used
but never defined warning for this symbol, we also mark TREE_PUBLIC to the decl.
We should also mark the decl having hidden visibility. And fix the attribute in
the same way for __aarch64_cpu_features identifier.

Minimal steps to reproduce the bug:

echo '__attribute__((target_clones("default", "aes"))) void func1() { }' > 1.c
echo '__attribute__((target_clones("default", "aes"))) void func2() { }' > 2.c
echo 'void func1();void func2();int main(){func1();func2();return 0;}' > main.c
gcc -flto -c 1.c 2.c
gcc -flto main.c 1.o 2.o

Fixes: 0cfde688e213 ("[aarch64] Add function multiversioning support")
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:

* config/aarch64/aarch64.cc (dispatch_function_versions): Adding
DECL_EXTERNAL, TREE_PUBLIC and hidden DECL_VISIBILITY to
__init_cpu_features_resolver and __aarch64_cpu_features.

(cherry picked from commit 875279ff3ee3b4135401286b8378087a24fd0f8d)

Daily bump.

jit: fix leak of pending_assemble_externals_set [PR117275]

My recent r15-4580-g779c0390e3b57d fix for resetting state in
varasm.cc introduced some noise to "make selftest-valgrind" and,
presumably, a memory leak in libgccjit:

==2462086== 160 (56 direct, 104 indirect) bytes in 1 blocks are definitely lost in loss record 248 of 352
==2462086==    at 0x5270E7D: operator new(unsigned long) (vg_replace_malloc.c:342)
==2462086==    by 0x1D1EB89: init_varasm_once() (varasm.cc:6806)
==2462086==    by 0x181C845: backend_init() (toplev.cc:1826)
==2462086==    by 0x181D41A: do_compile() (toplev.cc:2193)
==2462086==    by 0x181D99C: toplev::main(int, char**) (toplev.cc:2371)
==2462086==    by 0x378391D: main (main.cc:39)

Fixed thusly.

gcc/ChangeLog:
PR jit/117275
* varasm.cc (process_pending_assemble_externals): Reset
pending_assemble_externals_set to nullptr after deleting it.
(varasm_cc_finalize): Delete pending_assemble_externals_set.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
(cherry picked from commit 7f41203f08b9948c1c636dc9d66571121c6c7793)
Signed-off-by: David Malcolm <dmalcolm@redhat.com>

jit: reset state in varasm.cc [PR117275]

PR jit/117275 reports various jit test failures seen on
powerpc64le-unknown-linux-gnu due to hitting this assertion
in varasm.cc on the 2nd compilation in a process:

#2  0x00007ffff63e67d0 in assemble_external_libcall (fun=0x7ffff2a4b1d8)
    at ../../src/gcc/varasm.cc:2650
2650          gcc_assert (!pending_assemble_externals_processed);
(gdb) p pending_assemble_externals_processed
$1 = true

We're not properly resetting state in varasm.cc after a compile
for libgccjit.

Fixed thusly.

gcc/ChangeLog:
PR jit/117275
* toplev.cc (toplev::finalize): Call varasm_cc_finalize.
* varasm.cc (varasm_cc_finalize): New.
* varasm.h (varasm_cc_finalize): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
(cherry picked from commit 779c0390e3b57d1eebd41bbfe43d1f329c91de6c)
Signed-off-by: David Malcolm <dmalcolm@redhat.com>

testsuite, jit: fix test-error-pr63969-missing-driver.c

jit.dg/test-error-pr63969-missing-driver.c tries to break PATH and
verify that an error is generated when using an external driver.

However it does this by unsetting PATH, and so the test could
accidentally find the driver if the system supplies a default and the
driver happens to be installed in that path (reported as rhbz#2318021).

Fix the test by instead setting PATH to a bogus value.

gcc/testsuite/ChangeLog:
* jit.dg/test-error-pr63969-missing-driver.c (create_code): When
breaking PATH, use setenv with a bogus value, rather than
unsetenv, in case the system uses a default path that contains
the driver binary.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
(cherry picked from commit f8dcb559e615dbb4557a23363f9532a3544a7241)
Signed-off-by: David Malcolm <dmalcolm@redhat.com>

aarch64: Assume alias conflict if common address reg changes [PR116783]

As the PR shows, pair fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn. This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

This is a backport (but not a straight cherry pick) of
r15-4518-gc0e54ce1999ccf2241f74c5188b11b92e5aedc1f.

gcc/ChangeLog:

PR rtl-optimization/116783
* config/aarch64/aarch64-ldp-fusion.cc
(def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(ldp_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.

Fix ICE due to subreg:us_truncate.

Force_operand issues an ICE when input
is (subreg:DI (us_truncate:V8QI)), it's probably because it's an
invalid rtx, So refine backend patterns for that.

gcc/ChangeLog:

PR target/117318
* config/i386/sse.md (*avx512vl_<code>v2div2qi2_mask_store_1):
Rename to ..
(avx512vl_<code>v2div2qi2_mask_store_1): .. this.
(avx512vl_<code>v2div2qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v4qi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v4qi2_mask_store_1): .. this.
(avx512vl_<code><mode>v4qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v8qi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v8qi2_mask_store_1): .. this.
(avx512vl_<code><mode>v8qi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code><mode>v4hi2_mask_store_1): Rename to ..
(avx512vl_<code><mode>v4hi2_mask_store_1): .. this.
(avx512vl_<code><mode>v4hi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code>v2div2hi2_mask_store_1): Rename to ..
(avx512vl_<code>v2div2hi2_mask_store_1): .. this.
(avx512vl_<code>v2div2hi2_mask_store_2): Change to
define_expand.
(*avx512vl_<code>v2div2si2_mask_store_1): Rename to ..
(avx512vl_<code>v2div2si2_mask_store_1): .. this.
(avx512vl_<code>v2div2si2_mask_store_2): Change to
define_expand.
(*avx512f_<code>v8div16qi2_mask_store_1): Rename to ..
(avx512f_<code>v8div16qi2_mask_store_1): .. this.
(avx512f_<code>v8div16qi2_mask_store_2): Change to
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117318.c: New test.

(cherry picked from commit bc0eeccf27a084461a2d5661e23468350acb43da)

Daily bump.

Fix miscompilation of function containing __builtin_unreachable

This is a wrong-code generation on the SPARC for a function containing
a call to __builtin_unreachable caused by the delay slot scheduling pass,
and more specifically the find_end_label function which has these lines:

  /* Otherwise, see if there is a label at the end of the function. If there
     is, it must be that RETURN insns aren't needed, so that is our return
     label and we don't have to do anything else.  */

The comment was correct 20 years ago but no longer is nowadays in the
presence of RTL epilogues and calls to __builtin_unreachable, so the
patch just removes the associated two lines of code:

  else if (LABEL_P (insn))
    *plabel = as_a <rtx_code_label *> (insn);

and otherwise contains just adjustments to the commentary.

gcc/
PR rtl-optimization/117327
* reorg.cc (find_end_label): Do not return a dangling label at the
end of the function and adjust commentary.

gcc/testsuite/
* gcc.c-torture/execute/20241029-1.c: New test.

rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]

Our power8 swap optimization pass has some special handling for optimizing
swaps of TImode variables.  The test case reported in bugzilla uses a call
to  __atomic_compare_exchange, which introduces a variable of PTImode and
that does not get the same treatment as TImode leading to wrong code
generation.  The simple fix is to treat PTImode identically to TImode.

2024-08-23  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/116415
* config/rs6000/rs6000.h (TI_OR_PTI_MODE): New define.
* config/rs6000/rs6000-p8swap.cc (rs6000_analyze_swaps): Use it to
handle PTImode identically to TImode.

gcc/testsuite/
PR target/116415
* gcc.target/powerpc/pr116415.c: New test.

(cherry picked from commit 6e68c3df1540c5bafbb47343698bf4e270333fdb)

Daily bump.

testsuite: add testcase for fixed PR107467

PR107467 ended up being fixed by the fix for PR115110, but let's
add the testcase on top.

gcc/testsuite/ChangeLog:
PR tree-optimization/107467
PR middle-end/115110

* g++.dg/lto/pr107467_0.C: New test.

(cherry picked from commit 4e09ae37dbe0a10f48490214f50ff733cc92280a)

Revert "testsuite: add testcase for fixed PR107467"

This reverts commit 2f0d109bd871d11b5b93468f271aa6dc34ef88d8.

testsuite: add testcase for fixed PR107467

PR107467 ended up being fixed by the fix for PR115110, but let's
add the testcase on top.

gcc/testsuite/ChangeLog:
PR tree-optimization/107467
PR middle-end/115110

* g++.dg/lto/pr107467_0.C: New test.

Daily bump.

testsuite: Sanitize pacbti test cases for Cortex-M

Some of the test cases were scanning for "bti", but it would,
incorrectly, match the ".arch_extenssion pacbti".

gcc/testsuite/ChangeLog:

* gcc.target/arm/bti-1.c: Check for asm instructions starting
with a tab.
* gcc.target/arm/bti-2.c: Likewise.
* gcc.target/arm/pac-1.c: Likewise.
* gcc.target/arm/pac-2.c: Likewise.
* gcc.target/arm/pac-3.c: Likewise.
* gcc.target/arm/pac-4.c: Likewise.
* gcc.target/arm/pac-6.c: Likewise.
* gcc.target/arm/pac-7.c: Likewise.
* gcc.target/arm/pac-8.c: Likewise.
* gcc.target/arm/pac-9.c: Likewise.
* gcc.target/arm/pac-10.c: Likewise.
* gcc.target/arm/pac-11.c: Likewise.
* gcc.target/arm/pac-15.c: Likewise.
* gcc.target/arm/pac-sibcall.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-authored-by: Yvan ROUX <yvan.roux@foss.st.com>
(cherry picked from commit 6ad29a858bac7cf9e765925cf5f6945e20f085be)

Daily bump.

Assorted --disable-checking fixes [PR117249]

We have currently 3 different definitions of gcc_assert macro, one used most
of the time (unless --disable-checking) which evaluates the condition at
runtime and also checks it at runtime, then one for --disable-checking GCC 4.5+
which looks like
((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
and a fallback one
((void)(0 && (EXPR)))
Now, the last one actually doesn't evaluate any of the side-effects in the
argument, just quiets up unused var/parameter warnings.
I've tried to replace the middle definition with
({ [[assume (EXPR)]]; (void) 0; })
for compilers which support assume attribute and statement expressions
(surprisingly quite a few spots use gcc_assert inside of comma expressions),
but ran into PR117287, so for now such a change isn't being proposed.

The following patch attempts to move important side-effects from gcc_assert
arguments.

Bootstrapped/regtested on x86_64-linux and i686-linux with normal
--enable-checking=yes,rtl,extra, plus additionally I've attempted to do
x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
((void)(0 && (EXPR)))
version when --disable-checking.  That version ran into spurious middle-end
warnings
../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too large [-Werror=alloca-larger-than=]
../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro ‘XALLOCAVEC’
  int op_num = ops.length ();
  int op_normal_num = op_num;
  gcc_assert (op_num > 0);
  int stmt_num = op_num - 1;
  gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
where we have gcc_assert exactly to work-around middle-end warnings.
Guess I'd need to also disable -Werror for this experiment, which actually
isn't a problem with unmodified system.h, because even for
--disable-checking we use the __builtin_unreachable at least in
stage2/stage3 and so the warnings aren't emitted, and even if it used
[[assume ()]]; it would work too because in stage2/stage3 we could again
rely on assume and statement expression support.

2024-10-25  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/117249
* tree-ssa-structalias.cc (insert_vi_for_tree): Move put calls out of
gcc_assert.
* lto-cgraph.cc (lto_symtab_encoder_delete_node): Likewise.
* gimple-ssa-strength-reduction.cc (get_alternative_base,
add_cand_for_stmt): Likewise.
* tree-eh.cc (add_stmt_to_eh_lp_fn): Likewise.
* except.cc (duplicate_eh_regions_1): Likewise.
* tree-ssa-reassoc.cc (insert_operand_rank): Likewise.
* config/nvptx/nvptx.cc (nvptx_expand_call): Use == rather than = in
gcc_assert.
* opts-common.cc (jobserver_info::disconnect): Call close outside of
gcc_assert and only check result in it.
(jobserver_info::return_token): Call write outside of gcc_assert and
only check result in it.
* genautomata.cc (output_default_latencies): Move j++ side-effect
outside of gcc_assert.
* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Use
== rather than = in gcc_assert.
* cgraph.cc (symbol_table::create_edge): Move ++edges_max_uid
side-effect outside of gcc_assert.

(cherry picked from commit e2a8772c9328960c625f5b95091d4312efa0e284)

c++: Further fix for get_member_function_from_ptrfunc [PR117259]

The following testcase shows that the previous get_member_function_from_ptrfunc
changes weren't sufficient and we still have cases where
-fsanitize=undefined with pointers to member functions can cause wrong code
being generated and related false positive warnings.

The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
some invariant arithmetics and in the end it could be really large
expressions which would be evaluated several times (and what is worse, with
-fsanitize=undefined those expressions then can have SAVE_EXPRs added to
their subparts for -fsanitize=bounds or -fsanitize=null or
-fsanitize=alignment instrumentation).  Tried to just build1 a SAVE_EXPR
+ add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
because cp_fold happily optimizes those SAVE_EXPRs away when it sees
SAVE_EXPR operand is tree_invariant_p.

So, the following patch instead of using save_expr or building SAVE_EXPR
manually builds a TARGET_EXPR.  Both types are pointers, so it doesn't need
to be destroyed in any way, but TARGET_EXPR is what doesn't get optimized
away immediately.

2024-10-24  Jakub Jelinek  <jakub@redhat.com>

PR c++/117259
* typeck.cc (get_member_function_from_ptrfunc): Use force_target_expr
rather than save_expr for instance_ptr and function.  Don't call it
for TREE_CONSTANT.

* g++.dg/ubsan/pr117259.C: New test.

(cherry picked from commit b25d3201b6338d9f71c64f524ca2974d9a1f38e8)

asan: Fix up build_check_stmt gsi handling [PR117209]

gsi_safe_insert_before properly updates gsi_bb in gimple_stmt_iterator
in case it splits objects, but unfortunately build_check_stmt was in
some places (but not others) using a copy of the iterator rather than
the iterator passed from callers and so didn't propagate that to callers.
I guess it didn't matter much before when it was just using
gsi_insert_before as that really didn't change the iterator.
The !before_p case is apparently dead code, nothing is calling it with
before_p=false since around 4.9.

2024-10-24 Jakub Jelinek <jakub@redhat.com>

PR sanitizer/117209
* asan.cc (maybe_cast_to_ptrmode): Formatting fix.
(build_check_stmt): Don't copy *iter into gsi, perform all
the updates on iter directly.

* gcc.dg/asan/pr117209.c: New test.

(cherry picked from commit 885143fa77599c44bfdd4e8e6b6987b7824db6ba)

Add regression test

gcc/testsuite
PR ada/116551
* gnat.dg/specs/vfa3.ads: New test.

ada: Fix internal error on bit-packed array type with Volatile_Full_Access

The problem occurs when the component type is a record type with default
values for the initialization procedure of the (base) array type, because
the compiler is trying to generate a full access for a parameter of the
base array type, which does not make sense.

gcc/ada/ChangeLog:

PR ada/116551
* gcc-interface/trans.cc (node_is_atomic) <N_Identifier>: Return
false if the type of the entity is an unconstrained array type.
(node_is_volatile_full_access) <N_Identifier>: Likewise.

AVR: target/116953 - Restore recog_data after calling jump_over_one_insn_p.

The previous fix for PR116953 is incomplete because references to
recog_data are escaping avr_out_sbxx_branch() in the form of %-operands
in the returned asm code template. This patch reverts the previous fix,
and re-extracts the operands by means of extract_constrain_insn_cached()
after the call of jump_over_one_insn_p().

PR target/116953
gcc/
* config/avr/avr.cc (avr_out_sbxx_branch): Revert previous fix
for PR116953 (r15-4078). Run extract_constrain_insn_cached
on the current insn after calling jump_over_one_insn_p.

(cherry picked from commit ca0ab7a0ac18911181e9161cfb8b87fb90039612)

Fortran: Simplify len_trim with array ref and fix mapping bug[PR84868].

2024-07-16 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/84868
* simplify.cc (gfc_simplify_len_trim): If the argument is an
element of a parameter array, simplify all the elements and
build a new parameter array to hold the result, after checking
that it doesn't already exist.
* trans-expr.cc (gfc_get_interface_mapping_array) if a string
length is available, use it for the typespec.
(gfc_add_interface_mapping): Supply the se string length.

gcc/testsuite/
PR fortran/84868
* gfortran.dg/pr84868.f90: New test.

(cherry picked from commit 9f966b6a8ff0244dd6f8bf36d876799d5f9bbaee)

c++: remove dg-warning [PR117274]

This warning was added for GCC 15, don't expect it.

PR c++/117274
PR c++/117107

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/decomp10.C: Remove captured binding warning.

Daily bump.

Fix ICE due to isa mismatch for the builtins.

gcc/ChangeLog:

PR target/117240
* config/i386/i386-builtin.def: Add avx/avx512f to vaes
ymm/zmm builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117240_avx.c: New test.
* gcc.target/i386/pr117240_avx512f.c: New test.

(cherry picked from commit 403e361d5aa620e77c9832578b2409a0fdd79d96)

c-family: Fix up -Wsizeof-pointer-memaccess ICEs [PR117230]

In the following testcases, we ICE on all 4 function calls.
The problem is using TYPE_PRECISION on vector types (but guess it
would be similarly problematic on structures/unions/arrays).
The test only differentiates between suggestion what to do, whether
to supply explicit size because sizeof (*p) for
{,{,un}signed }char *p is not very likely what the user want, or
dereferencing the pointer, so I think limiting that suggestion
to integral types is ok.

2024-10-22 Jakub Jelinek <jakub@redhat.com>

PR c/117230
* c-warn.cc (sizeof_pointer_memaccess_warning): Only compare
TYPE_PRECISION of TREE_TYPE (type) to precision of char if
TREE_TYPE (type) is integral type.

* c-c++-common/Wsizeof-pointer-memaccess5.c: New test.

(cherry picked from commit 5fd1c0c1b6968d55e3f997d67a4c149edf20c012)

match.pd: Further fma negation fixes [PR116891]

On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> >     PR middle-end/116891
> >     * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> >     Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
>
> Guess it would be nice to have a testcase which FAILs without the patch and
> PASSes with it, but it can be added later.

I've added such a testcase now, and additionally found the fix only fixed
one of the 4 problematic similar cases.

Here is a patch which fixes the others too and adds the testcases.
fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
only due to the bar/baz/qux checks) and PASSes with the patch.

2024-10-15  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/116891
* match.pd ((negate (fmas@3 @0 @1 @2)) -> (IFN_FNMS @0 @1 @2)):
Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
((negate (IFN_FMS@3 @0 @1 @2)) -> (IFN_FNMA @0 @1 @2)): Likewise.
((negate (IFN_FNMA@3 @0 @1 @2)) -> (IFN_FMS @0 @1 @2)): Likewise.

* gcc.dg/pr116891.c: New test.
* gcc.target/i386/fma-pr116891.c: New test.

(cherry picked from commit 4366f0c7e296ea0d7279343c9b0a1d597588a1da)

middle-end/116891 - fix (negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)

Transforming -fma (-a, b, -c) to fma (a, b, c) is only valid when
not rounding towards -inf or +inf as the sign of the multiplication
changes.

PR middle-end/116891
* match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.

(cherry picked from commit c53bd48c6920bc1f4039b6682aafbf414a600e47)

Daily bump.

c++: non-dep structured binding decltype again [PR117107]

The patch for PR92687 handled the usual case of a decomp variable not being
in the table, but missed the case of there being nothing in the table yet.

PR c++/117107
PR c++/92687

gcc/cp/ChangeLog:

* decl.cc (lookup_decomp_type): Handle null table.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/decomp10.C: New test.

(cherry picked from commit 71e13ea134b04562f8f2cdd9c4a55dbb0905f96a)

c++/modules: Fix treatment of unnamed types [PR116929]

In r14-9530 we relaxed "depending on type with no-linkage" errors for
declarations that could actually be accessed from different TUs anyway.
However, this also enabled it for unnamed types, which never work.

In a normal module interface, an unnamed type is TU-local by
[basic.link] p15.2, and so cannot be exposed or the program is
ill-formed. We don't yet implement this checking but we should assume
that we will later; currently supporting this actually causes ICEs when
attempting to create the mangled name in some situations.

For a header unit, by [module.import] p5.3 it is unspecified whether two
TUs importing a header unit providing such a declaration are importing
the same header unit. In this case, we would require name mangling
changes to somehow allow the (anonymous) type exported by such a header
unit to correspond across different TUs in the presence of other
anonymous declarations, so for this patch just assume that this case
would be an ODR violation instead.

PR c++/116929

gcc/cp/ChangeLog:

* tree.cc (no_linkage_check): Anonymous types can't be accessed
in a different TU.

gcc/testsuite/ChangeLog:

* g++.dg/modules/linkage-1_a.C: Remove anonymous type test.
* g++.dg/modules/linkage-1_b.C: Likewise.
* g++.dg/modules/linkage-1_c.C: Likewise.
* g++.dg/modules/linkage-2.C: Add note about anonymous types.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 0173dcce92baa62a74929814a75edb75eeab1a54)

testsuite: arm: Use check-function-bodies in fp16-aapcs-* tests

Converted the tests to use check-function-bodies in order to ensure that
the sequence is correct.

gcc/testsuite/ChangeLog:

* gcc.target/arm/fp16-aapcs-1.c: Use check-function-bodies.
* gcc.target/arm/fp16-aapcs-2.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 205515da82a2914d765e74ba73fd2765e1254112)

testsuite: arm: Relax expected asm in bitfield* and union-2 tests

Below -O2, lsls/lsrs are prefered. For -O2 and above, lsl/lsr are
prefered.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c: Allow lsl and
lsr instructions.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-8.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/bitfield-and-union.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/union-2.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit a79ca49b5ce0ad4738062572948e52485aa2da2b)

testsuite: arm: Use check-function-bodies in cmse-5 tests

Converted the tests to use check-function-bodies in order to ensure that
the sequence is correct.
This also allows both APSR_nzcvq and APSR_nzcvqg as target selector does
not work when the -march and/or -mcpu overrides the target to test.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: Use
check-function-bodies.
* gcc.target/arm/cmse/mainline/8m/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8m/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8m/softfp-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8m/softfp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard-sp/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/hard/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/soft/cmse-5.c: Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp-sp/cmse-5.c:
Likewise.
* gcc.target/arm/cmse/mainline/8_1m/softfp/cmse-5.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
(cherry picked from commit 835ad52fbb9c8a0bb4e713deb6c99679d8b77d60)

testsuite: Skip pr112305.c for -O[01] on simulators

gcc.dg/torture/pr112305.c contains an inner loop that executes
0x8000_0014 times and an outer loop that executes 5 times, giving about
10 billion total executions of the inner loop body.  At -O2 and above we
are able to remove the inner loop, but at -O1 we keep a no-op loop:

        dls     lr, r3
.L3:
        subs    r3, r3, #1
        le      lr, .L3

and at -O0 we of course don't optimise.

This can lead to long execution times on simulators, possibly
triggering a timeout.

gcc/testsuite
* gcc.dg/torture/pr112305.c: Skip at -O0 and -O1 for simulators.

(cherry picked from commit 4e80432c52a18b92899244e8ce3c243f560766a6)

libstdc++: Implement LWG 3664 changes to ranges::distance

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (__distance_fn::operator()):
Adjust iterator/sentinel overloads as per LWG 3664.
* testsuite/24_iterators/range_operations/distance.cc:
Test LWG 3664 example.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 7c0d1e9f2a2f1d41d9eb755c36c871d92638c4b7)

libstdc++/ranges: Implement various small LWG issues

This implements the following small LWG issues:

  3848. adjacent_view, adjacent_transform_view and slide_view missing base accessor
  3851. chunk_view::inner-iterator missing custom iter_move and iter_swap
  3947. Unexpected constraints on adjacent_transform_view::base()
  4001. iota_view should provide empty
  4012. common_view::begin/end are missing the simple-view check
  4013. lazy_split_view::outer-iterator::value_type should not provide default constructor
  4035. single_view should provide empty
  4053. Unary call to std::views::repeat does not decay the argument
  4054. Repeating a repeat_view should repeat the view

libstdc++-v3/ChangeLog:

* include/std/ranges (single_view::empty): Define as per LWG 4035.
(iota_view::empty): Define as per LWG 4001.
(lazy_split_view::_OuterIter::value_type): Remove default
constructor and make other constructor private as per LWG 4013.
(common_view::begin): Disable non-const overload for simple
views as per LWG 4012.
(common_view::end): Likewise.
(adjacent_view::base): Define as per LWG 3848.
(adjacent_transform_view::base): Likewise.
(chunk_view::_InnerIter::iter_move): Define as per LWG 3851.
(chunk_view::_InnerIter::itep_swap): Likewise.
(slide_view::base): Define as per LWG 3848.
(repeat_view): Adjust deduction guide as per LWG 4053.
(_Repeat::operator()): Adjust single-parameter overload as per
LWG 4054.
* testsuite/std/ranges/adaptors/adjacent/1.cc: Verify existence
of base member function.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc: Likewise.
* testsuite/std/ranges/adaptors/chunk/1.cc: Test LWG 3851 example.
* testsuite/std/ranges/adaptors/slide/1.cc: Verify existence of
base member function.
* testsuite/std/ranges/iota/iota_view.cc: Test LWG 4001 example.
* testsuite/std/ranges/repeat/1.cc: Test LWG 4053/4054 examples.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 20165d0107abd0f839f2519818b904f029f4ae55)

libstdc++: Add some missing ranges feature-test macro tests

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/contains/1.cc: Verify value of
__cpp_lib_ranges_contains.
* testsuite/25_algorithms/find_last/1.cc: Verify value of
__cpp_lib_ranges_find_last.
* testsuite/25_algorithms/iota/1.cc: Verify value of
__cpp_lib_ranges_iota.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 8e0da56f18b3678beee9d2bae27e08a0e122573a)

libstdc++: Implement P2997R1 changes to the indirect invocability concepts

This implements the changes of this C++26 paper as a DR against C++20.

In passing this patch removes the std/ranges/version_c++23.cc test which
is now mostly obsolete after the version.def FTM refactoring, and instead
expands the __cpp_lib_ranges checks in another test so that it verifies
the exact value of the FTM on a per language version basis.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (indirectly_unary_invocable):
Relax as per P2997R1.
(indirectly_regular_unary_invocable): Likewise.
(indirect_unary_predicate): Likewise.
(indirect_binary_predicate): Likewise.
(indirect_equivalence_relation): Likewise.
(indirect_strict_weak_order): Likewise.
* include/bits/version.def (ranges): Update value for C++26.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/indirect_callable/p2997r1.cc: New test.
* testsuite/std/ranges/version_c++23.cc: Remove.
* testsuite/std/ranges/headers/ranges/synopsis.cc: Refine the
__cpp_lib_ranges checks.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 620232426bd83a79c81cd2be6f485834c618e920)

libstdc++: Implement P2609R3 changes to the indirect invocability concepts

This implements the changes of this C++23 paper as a DR against C++20.

Note that after the later P2538R1 "ADL-proof std::projected" (which we
already implement), we can't use a simple partial specialization to match
specializations of the 'projected' alias template. So instead we identify
such specializations using a pair of distinguishing member aliases.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (__detail::__indirect_value):
Define.
(__indirect_value_t): Define as per P2609R3.
(iter_common_reference_t): Adjust as per P2609R3.
(indirectly_unary_invocable): Likewise.
(indirectly_regular_unary_invocable): Likewise.
(indirect_unary_predicate): Likewise.
(indirect_binary_predicate): Likewise.
(indirect_equivalence_relation): Likewise.
(indirect_strict_weak_order): Likewise.
(__detail::__projected::__type): Define member aliases
__projected_Iter and __projected_Proj providing the
template arguments of the current specialization.
* include/bits/version.def (ranges): Update value.
* include/bits/version.h: Regenerate.
* testsuite/24_iterators/indirect_callable/p2609r3.cc: New test.
* testsuite/std/ranges/version_c++23.cc: Update expected value
of __cpp_lib_ranges macro.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit b552730faf36f1eae1dc6e73ccc93a016dec5401)

Daily bump.

tree-optimization/117104 - add missed guards to max(a,b) != a simplification

For vector types we have to make sure the comparison result is a vector
type and the resulting compare operation is supported. As the resulting
compare is never an equality compare I didn't bother to check for the
cbranch case.

PR tree-optimization/117104
* match.pd ((cmp:c (minmax:c @0 @1) @0) -> (out @0 @1)): Properly
guard the vector case.

* gcc.dg/pr117104.c: New testcase.

(cherry picked from commit f54d42e00007e7a558b273d87f95b3e5b1938f5a)

tree-optimization/116982 - analyze scalar loop exit early

The following makes sure to discover the scalar loop IV exit during
analysis as failure to do so (if DCE and friends are disabled this
can happen due to if-conversion doing DCE and FRE on the if-converted
loop) would ICE later.

I refrained from larger refactoring to be able to eventually backport.

PR tree-optimization/116982
* tree-vectorizer.h (vect_analyze_loop): Pass in .LOOP_VECTORIZED
call.
(vect_analyze_loop_form): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_form): Reject loops where we
cannot determine a IV exit for the scalar loop.
(vect_analyze_loop): Adjust.
* tree-vectorizer.cc (try_vectorize_loop_1): Likewise.
* tree-parloops.cc (gather_scalar_reductions): Likewise.

(cherry picked from commit 9b86efd5210101954bd187c3aa8bb909610a5746)

tree-optimization/116907 - stale BLOCK reference from DECL_VALUE_EXPR

When we remove unused BLOCKs we fail to clean references to them
from DECL_VALUE_EXPRs of variables in other BLOCKs which in the
PR causes LTO streaming to walk into pointers to GGC freed blocks.

There's the question of whether such DECL_VALUE_EXPRs should keep
variables and blocks referenced live (it doesn't seem to do that)
and whether such DECL_VALUE_EXPRs should have survived in the
first place.

PR tree-optimization/116907
* tree-ssa-live.cc (clear_unused_block_pointer_in_block): New
helper.
(clear_unused_block_pointer): Call it.

(cherry picked from commit 7d15248d41dc45a4ba2d38ff532b672a5c0651d0)

tree-optimization/116481 - avoid building function_type[]

The following avoids building an array type with function or method
element type during diagnosing an array bound violation as this
will result in an error, rejecting a program with a not too useful
error message. Instead build such array type manually.

PR tree-optimization/116481
* pointer-query.cc (build_printable_array_type):
Build an array types with function or method element type
manually to avoid bogus diagnostic.

* gcc.dg/pr116481.c: New testcase.

(cherry picked from commit 1506027347776a2f6ec5b92d56ef192e85944e2e)

tree-optimization/116290 - fix compare-debug issue in ldist

Loop distribution does different analysis with -g0/-g due to counting
a debug stmt starting a BB against a limit which will everntually
lead to different IVOPTs choices. I've fixed a possible IVOPTs
issue on the way even though it doesn't make a difference here.

PR tree-optimization/116290
* tree-loop-distribution.cc (determine_reduction_stmt_1): PHIs
have no debug variants. Start with first non-debug real stmt.
* tree-ssa-loop-ivopts.cc (find_givs_in_bb): Do not analyze
debug stmts.

* gcc.dg/pr116290.c: New testcase.

(cherry picked from commit 566740013b3445162b8c4bc2205e4e568d014968)

middle-end/115110 - Fix view_converted_memref_p

view_converted_memref_p was checking the reference type against the
pointer type of the offset operand rather than its pointed-to type
which leads to all refs being subject to view-convert treatment
in get_alias_set causing numerous testsuite fails but with its
new uses from r15-512-g9b7cad5884f21c is also a wrong-code issue.

PR middle-end/115110
* tree-ssa-alias.cc (view_converted_memref_p): Fix.

(cherry picked from commit a5b3721c06646bf5b9b50a22964e8e2bd4d03f5f)

rs6000: Correct the function code for _AMO_LD_DEC_BOUNDED

Corrected the function code for the Atomic Memory Operation "Fetch and Decrement
Bounded", changing it from 0x1A to 0x1C.

2024-10-11 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/

* config/rs6000/amo.h (enum _AMO_LD): Correct the function code for
_AMO_LD_DEC_BOUNDED.

(cherry picked from commit 1a4c5643a5911d130dfab9a064222baeeb7f9be7)

Refine splitters related to "combine vpcmpuw + zero_extend to vpcmpuw"

r12-6103-g1a7ce8570997eb combines vpcmpuw + zero_extend to vpcmpuw
with the pre_reload splitter, but the splitter transforms the
zero_extend into a subreg which make reload think the upper part is
garbage, it's not correct.

The patch adjusts the zero_extend define_insn_and_split to
define_insn to keep zero_extend.

gcc/ChangeLog:

PR target/117159
* config/i386/sse.md
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Change from define_insn_and_split to define_insn.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>):
Ditto.
(*<avx512>_cmp<V48H_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Split to the zero_extend pattern.
(*<avx512>_cmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI12_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.
(*<avx512>_ucmp<VI48_AVX512VL:mode>3_zero_extend<SWI248x:mode>_2):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117159.c: New test.
* gcc.target/i386/avx512bw-pr103750-1.c: Remove xfail.
* gcc.target/i386/avx512bw-pr103750-2.c: Remove xfail.

(cherry picked from commit 5259d3927c1c8e3a15b4b844adef59b48c241233)

Daily bump.

ipa: Treat static constructors and destructors as non-local (PR 115815)

In PR 115815, IPA-SRA thought it had control over all invocations of a
(recursive) static destructor but it did not see the implied
invocation which led to the original being left behind and the
clean-up code encountering uses of SSAs that definitely should have
been dead.

Fixed by teaching cgraph_node::can_be_local_p about static
constructors and destructors.  Similar test is missing in
cgraph_node::local_p so I added the check there as well.

In addition to the commit with the fix, this backport also contains
squashed commit 1a458bdeb223ffa501bac8e76182115681967094 which fixes
dejagnu directives in the testcase.

gcc/ChangeLog:

2024-07-25  Martin Jambor  <mjambor@suse.cz>

PR ipa/115815
* cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
* ipa-visibility.cc (non_local_p): Likewise.
(cgraph_node::local_p): Delete extraneous line of tabs.

gcc/testsuite/ChangeLog:

2024-07-25  Martin Jambor  <mjambor@suse.cz>

PR ipa/115815
* gcc.dg/lto/pr115815_0.c: New test.

(cherry picked from commit e98ad6a049c96c21cf641954584c2f5b7df0ce93)

RISC-V:Bugfix for C++ code compilation failure with rv32imafc_zve32f[pr116883]

From: xuli <xuli1@eswincomputing.com>

Example as follows:

int main()
{
  unsigned long arraya[128], arrayb[128], arrayc[128];
  for (int i = 0; i < 128; i++)
   {
      arraya[i] = arrayb[i] + arrayc[i];
   }
  return 0;
}

Compiled with -march=rv32imafc_zve32f -mabi=ilp32f, it will cause a compilation issue:

riscv_vector.h:40:25: error: ambiguating new declaration of 'vint64m4_t __riscv_vle64(vbool16_t, const long long int*, unsigned int)'
   40 | #pragma riscv intrinsic "vector"
      |                         ^~~~~~~~
riscv_vector.h:40:25: note: old declaration 'vint64m1_t __riscv_vle64(vbool64_t, const long long int*, unsigned int)'

With zvl=32b, vbool16_t is registered in init_builtins() with
type_common.precision=0x101 (nunits=2), mode_nunits[E_RVVMF16BI]=[2,2].

Normally, vbool64_t is only valid when TARGET_MIN_VLEN > 32, so vbool64_t
is not registered in init_builtins(), meaning vbool64_t=null.

In order to implement __attribute__((target("arch=+v"))), we must register
all vector types and all RVV intrinsics. Therefore, vbool64_t will be registered
by default with zvl=128b in reinit_builtins(), resulting in
type_common.precision=0x101 (nunits=2) and mode_nunits[E_RVVMF64BI]=[2,2].

We then get TYPE_VECTOR_SUBPARTS(vbool16_t) == TYPE_VECTOR_SUBPARTS(vbool64_t),
calculated using type_common.precision, resulting in 2. Since vbool16_t and
vbool64_t have the same element type (boolean_type), the compiler treats them
as the same type, leading to a re-declaration conflict.

After all types and intrinsics have been registered, processing
__attribute__((target("arch=+v"))) will update the parameters option and
init_adjust_machine_modes. Therefore, to avoid conflicts, we can choose
zvl=4096b for the null type reinit_builtins().

command option zvl=32b
  type         nunits
  vbool64_t => null
  vbool32_t=> [1,1]
  vbool16_t=> [2,2]
  vbool8_t=>  [4,4]
  vbool4_t=>  [8,8]
  vbool2_t=>  [16,16]
  vbool1_t=>  [32,32]

reinit zvl=128b
  vbool64_t => [2,2] conflict with zvl32b vbool16_t=> [2,2]
reinit zvl=256b
  vbool64_t => [4,4] conflict with zvl32b vbool8_t=>  [4,4]
reinit zvl=512b
  vbool64_t => [8,8] conflict with zvl32b vbool4_t=>  [8,8]
reinit zvl=1024b
  vbool64_t => [16,16] conflict with zvl32b vbool2_t=>  [16,16]
reinit zvl=2048b
  vbool64_t => [32,32] conflict with zvl32b vbool1_t=>  [32,32]
reinit zvl=4096b
  vbool64_t => [64,64] zvl=4096b is ok

Signed-off-by: xuli <xuli1@eswincomputing.com>
PR target/116883

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_pragma_intrinsic_flags_pollute): Choose zvl4096b
to initialize null type.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/pr116883.C: New test.

(cherry picked from commit fd8e590ff11266598d8f9b3d03d72ba7a6100512)

c++: checking ICE w/ constexpr if and lambda as def targ [PR117054]

Here we're tripping over the assert in extract_locals_r which enforces
that an extra-args tree appearing inside another extra-args tree doesn't
actually have extra args. This invariant doesn't always hold for lambdas
(which recently gained the extra-args mechanism) but that should be
harmless since cp_walk_subtrees doesn't walk LAMBDA_EXPR_EXTRA_ARGS and
so should be immune to the PR114303 issue for now. So let's just disable
this assert for lambdas.

PR c++/117054

gcc/cp/ChangeLog:

* pt.cc (extract_locals_r): Disable tree_extra_args assert
for LAMBDA_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ9.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit bb2bfdb2048aed18ef7dc01b51816a800e83ce54)

c++: ICE with -Wtautological-compare in template [PR116534]

Pre r14-4793, we'd call warn_tautological_cmp -> operand_equal_p
with operands wrapped in NON_DEPENDENT_EXPR, which works, since
o_e_p bails for codes it doesn't know. But now we pass operands
not encapsulated in NON_DEPENDENT_EXPR, and crash, because the
template tree for &a[x] has null DECL_FIELD_OFFSET.

This patch extends r12-7797 to cover the case when DECL_FIELD_OFFSET
is null.

PR c++/116534

gcc/ChangeLog:

* fold-const.cc (operand_compare::operand_equal_p): If either
field's DECL_FIELD_OFFSET is null, compare the fields with ==.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wtautological-compare4.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 7ca486889b1b1c7e7bcbbca3b6caa103294ec07d)

c++: wrong error due to std::initializer_list opt [PR116476]

Here maybe_init_list_as_array gets elttype=field, init={NON_LVALUE_EXPR <2>}
and it tries to convert the init's element type (int) to field
using implicit_conversion, which works, so overall maybe_init_list_as_array
is successful.

But it constifies init_elttype so we end up with "const int". Later,
when we actually perform the conversion and invoke field::field(T&&),
we end up with this error:

error: binding reference of type 'int&&' to 'const int' discards qualifiers

So I think maybe_init_list_as_array should try to perform the conversion,
like it does below with fc.

PR c++/116476

gcc/cp/ChangeLog:

* call.cc (maybe_init_list_as_array): Try convert_like and see if it
worked.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-opt2.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 9f79c7ddff5f1b004803931406ad17eaba095fff)

c++: ICE with ()-init and TARGET_EXPR eliding [PR116424]

Here we crash on a cp_gimplify_expr/TARGET_EXPR assert:

      gcc_checking_assert (!TARGET_EXPR_ELIDING_P (*expr_p)
                           || !TREE_ADDRESSABLE (TREE_TYPE (*expr_p)));

We cannot elide the TARGET_EXPR because we're taking its address.

It is set as eliding in massage_init_elt.  I've tried to not set
TARGET_EXPR_ELIDING_P when the context is not direct-initialization.
That didn't work: even when it's not direct-initialization now, it
can become one later, for instance, after split_nonconstant_init.
One problem is that replace_placeholders_for_class_temp_r will replace
placeholders in non-eliding TARGET_EXPRs with the slot, but if we then
elide the TARGET_EXPR, we end up with a "stray" VAR_DECL and crash.
(Only some TARGET_EXPRs are handled by replace_decl.)

I thought I'd have to go back to
<https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651163.html> but
then I realized that this problem occurrs only with ()-init but not
{}-init.  With {}-init, there is no problem, because we are clearing
TARGET_EXPR_ELIDING_P in process_init_constructor_record:

       /* We can't actually elide the temporary when initializing a
          potentially-overlapping field from a function that returns by
          value.  */
       if (ce->index
           && TREE_CODE (next) == TARGET_EXPR
           && unsafe_copy_elision_p (ce->index, next))
         TARGET_EXPR_ELIDING_P (next) = false;

But that does not happen for ()-init because we have no ce->index.
()-init doesn't allow brace elision so we don't really reshape them.

But I can just move the clearing a few lines down and then it handles
both ()-init and {}-init.

PR c++/116424

gcc/cp/ChangeLog:

* typeck2.cc (process_init_constructor_record): Move the clearing of
TARGET_EXPR_ELIDING_P down.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/paren-init38.C: New test.

(cherry picked from commit 15f857af2943a4aa282d04ff71f860352ad3291b)

i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]

Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
<sse2p4_1>_pinsr<ssemodesuffix> insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.

Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.

PR target/117116

gcc/ChangeLog:

* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117116.c: New test.

(cherry picked from commit 80d7032067a3a5b76aecd657d9b35b0a8f5a941d)

Daily bump.

libstdc++: Fix Python deprecation warning in printers.py

python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed as positional argument

The Python docs say:

  Deprecated since version 3.13: Passing count and flags as positional
  arguments is deprecated. In future Python versions they will be
  keyword-only parameters.

Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.

(cherry picked from commit b9e98bb9919fa9f07782f23f79b3d35abb9ff542)

libstdc++: Increase timeouts for PSTL tests in debug mode [PR90276]

These tests compile very slowly in debug mode.

libstdc++-v3/ChangeLog:

PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc:
Increase timeout for debug mode.
* testsuite/25_algorithms/pstl/alg_modifying_operations/transform_binary.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_nonmodifying/mismatch.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/lexicographical_compare.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/minmax_element.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.

(cherry picked from commit e65b6627a36869b01bbe128a5324e4b415b28880)

libstdc++: Implement LWG 3564 for ranges::transform_view

The _Iterator<true> type returned by begin() const uses const F& to
transform the elements, so it should use const F& to determine the
iterator's value_type and iterator_category as well.

This was accepted into the WP in July 2022.

libstdc++-v3/ChangeLog:

* include/std/ranges (transform_view:_Iterator): Use const F&
to determine value_type and iterator_category of
_Iterator<true>, as per LWG 3564.
* testsuite/std/ranges/adaptors/transform.cc: Check value_type
and iterator_category.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit dde19c600c3c8a1d765c9b4961d2556e89edad14)

libstdc++: Fix localized %c formatting for <chrono> [PR117085]

When formatting a time point with %c we call std::vformat_to using the
formatting locale's D_T_FMT string, but we weren't adding the L option
to the format string. This meant we always interpreted D_T_FMT in the C
locale, instead of using the formatting locale as obviously intended
when %c is used.

libstdc++-v3/ChangeLog:

PR libstdc++/117085
* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add L
option to format string.
* testsuite/std/time/format.cc: Move to...
* testsuite/std/time/format/format.cc: ...here.
* testsuite/std/time/format/pr117085.cc: New test.

(cherry picked from commit 4ad697bb7f1aad252e1398c6f13eed3fa6d0ca5b)

libstdc++: Tweak %c formatting for chrono types

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_c): Add
[[unlikely]] attribute to condition for missing %c format in
locale. Use %T instead of %H:%M:%S in fallback.

(cherry picked from commit ce89d2f3170e0d6474cee2c5cb9d478426a5b2f6)

libstdc++: Populate generic std::time_get's wide %c format [PR117135]

I missed out the __timepunct<wchar_t> specialization for the "generic"
implementation when defining the %c format in r15-4016-gc534e37faccf48.

libstdc++-v3/ChangeLog:

PR libstdc++/117135
* config/locale/generic/time_members.cc
(__timepunct<wchar_t>::_M_initialize_timepunc): Set
_M_date_time_format for C locale. Set %Ex formats to the same
values as the %x formats.

(cherry picked from commit 707d84efee7f7eb5a336935f386e094402f267a6)

libstdc++: Use std::move for iterator in ranges::fill [PR117094]

Input iterators aren't required to be copyable.

libstdc++-v3/ChangeLog:

PR libstdc++/117094
* include/bits/ranges_algobase.h (__fill_fn): Use std::move for
iterator that might not be copyable.
* testsuite/25_algorithms/fill/constrained.cc: Check
non-copyable iterator with sized sentinel.

(cherry picked from commit 03623fa91ff36ecb9faa3b55f7842a39b759594e)

middle-end: Fix ifcvt predicate generation for masked function calls

Up until now, due to a latent bug in the code for the ifcvt pass,
irrespective of the branch taken in a conditional statement, the
original condition for the if statement was used in masking the
function call.

Thus, for code such as:

  if (a[i] > limit)
    b[i] = fixed_const;
  else
    b[i] = fn (a[i]);

we would generate the following (wrong) if-converted tree code:

  _1 = a[i_1];
  _2 = _1 > limit;
  _3 = .MASK_CALL (fn, _1, _2);
  cstore_4 = _2 ? fixed_const : _3;

as opposed to the correct expected sequence:

  _1 = a[i_1];
  _2 = _1 > limit;
  _3 = ~_2;
  _4 = .MASK_CALL (fn, _1, _3);
  cstore_5 = _2 ? fixed_const : _4;

This patch ensures that the correct predicate mask generation is
carried out such that, upon autovectorization, the correct vector
lanes are selected in the vectorized function call.

gcc/ChangeLog:

* tree-if-conv.cc (predicate_statements): Fix handling of
predicated function calls.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fncall-mask.c: New.

Fix handling of ICF_NOVOPS in ipa-modref

As shown in somewhat convoluted testcase, ipa-modref is mistreating
ECF_NOVOPS as "having no side effects". This come from time when
modref cared only about memory accesses and thus it was possible to
shortcut on it.

This patch removes (hopefully) all those bad shortcuts.
Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR ipa/109985

* ipa-modref.cc (modref_summary::useful_p): Fix handling of ECF_NOVOPS.
(modref_access_analysis::process_fnspec): Likevise.
(modref_access_analysis::analyze_call): Likevise.
(propagate_unknown_call): Likevise.
(modref_propagate_in_scc): Likevise.
(modref_propagate_flags_in_scc): Likewise.
(ipa_merge_modref_summary_after_inlining): Likewise.

(cherry picked from commit efcbe7b985e24ac002a863afd609c44a67761195)

aarch64: Fix caller saves of VNx2QI [PR116238]

The testcase contains a VNx2QImode pseudo that is live across a call
and that cannot be allocated a call-preserved register.  LRA quite
reasonably tried to save it before the call and restore it afterwards.
Unfortunately, the target told it to do that in SImode, even though
punning between SImode and VNx2QImode is disallowed by both
TARGET_CAN_CHANGE_MODE_CLASS and TARGET_MODES_TIEABLE_P.

The natural class to use for SImode is GENERAL_REGS, so this led
to an unsalvageable situation in which we had:

  (set (subreg:VNx2QI (reg:SI A) 0) (reg:VNx2QI B))

where A needed GENERAL_REGS and B needed FP_REGS.  We therefore ended
up in a reload loop.

The hooks above should ensure that this situation can never occur
for incoming subregs.  It only happened here because the target
explicitly forced it.

The decision to use SImode for modes smaller than 4 bytes dates
back to the beginning of the port, before 16-bit floating-point
modes existed.  I'm not sure whether promoting to SImode really
makes sense for any FPR, but that's a separate performance/QoI
discussion.  For now, this patch just disallows using SImode
when it is wrong for correctness reasons, since that should be
safer to backport.

gcc/
PR testsuite/116238
* config/aarch64/aarch64.cc (aarch64_hard_regno_caller_save_mode):
Only return SImode if we can convert to and from it.

gcc/testsuite/
PR testsuite/116238
* gcc.target/aarch64/sve/pr116238.c: New test.

(cherry picked from commit ec9d6d45191f639482344362d048294e74587ca3)

Add regression test

gcc/testsuite/
PR ada/114593
* gnat.dg/specs/generic_inst2-child2.ads: New test.
* gnat.dg/specs/generic_inst2.ads: New helper.
* gnat.dg/specs/generic_inst2-child1.ads: Likewise.

ada: Type conversion in instance incorrectly rejected.

In some cases, a legal type conversion in a generic package is correctly
accepted but the corresponding type conversion in an instance of the generic
is incorrectly rejected.

gcc/ada/
PR ada/114593
* sem_res.adb (Valid_Conversion): Test In_Instance instead of
In_Instance_Body.

Add a new tune avx256_avoid_vec_perm for SRF.

According to Intel SOM[1], For Crestmont, most 256-bit Intel AVX2
instructions can be decomposed into two independent 128-bit
micro-operations, except for a subset of Intel AVX2 instructions,
known as cross-lane operations, can only compute the result for an
element by utilizing one or more sources belonging to other elements.

The 256-bit instructions listed below use more operand sources than
can be natively supported by a single reservation station within these
microarchitectures. They are decomposed into two μops, where the first
μop resolves a subset of operand dependencies across two cycles. The
dependent second μop executes the 256-bit operation by using a single
128-bit execution port for two consecutive cycles with a five-cycle
latency for a total latency of seven cycles.

VPERM2I128 ymm1, ymm2, ymm3/m256, imm8
VPERM2F128 ymm1, ymm2, ymm3/m256, imm8
VPERMPD ymm1, ymm2/m256, imm8
VPERMPS ymm1, ymm2, ymm3/m256
VPERMD ymm1, ymm2, ymm3/m256
VPERMQ ymm1, ymm2/m256, imm8

Instead of setting tune avx128_optimal for SRF, the patch add a new
tune avx256_avoid_vec_perm for it. so by default, vectorizer still
uses 256-bit VF if cost is profitable, but lowers to 128-bit whenever
256-bit vec_perm is needed for auto-vectorization. w/o vec_perm,
performance of 256-bit vectorization should be similar as 128-bit
ones(some benchmark results show it's even better than 128-bit
vectorization since it enables more parallelism for convert cases.)

[1] https://www.intel.com/content/www/us/en/content-details/814198/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html

gcc/ChangeLog:

* config/i386/i386.cc (ix86_vector_costs::ix86_vector_costs):
Add new member m_num_avx256_vec_perm.
(ix86_vector_costs::add_stmt_cost): Record 256-bit vec_perm.
(ix86_vector_costs::finish_cost): Prevent vectorization for
TAREGT_AVX256_AVOID_VEC_PERM when there's 256-bit vec_perm
instruction.
* config/i386/i386.h (TARGET_AVX256_AVOID_VEC_PERM): New
Macro.
* config/i386/x86-tune.def (X86_TUNE_AVX256_SPLIT_REGS): Add
m_CORE_ATOM.
(X86_TUNE_AVX256_AVOID_VEC_PERM): New tune.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx256_avoid_vec_perm.c: New test.

(cherry picked from commit 9eaecce3d8c1d9349adbf8c2cdaf8d87672ed29c)

Add new microarchitecture tune for SRF/GRR/CWF.

For Crestmont, 4-operand vex blendv instructions come from MSROM and
is slower than 3-instructions sequence (op1 & mask) | (op2 & ~mask).
legacy blendv instruction can still be handled by the decoder.

The patch add a new tune which is enabled for all processors except
for SRF/CWF. It will use vpand + vpandn + vpor instead of
vpblendvb(similar for vblendvps/vblendvpd) for SRF/CWF.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Guard
instruction blendv generation under new tune.
* config/i386/i386.h (TARGET_SSE_MOVCC_USE_BLENDV): New Macro.
* config/i386/x86-tune.def (X86_TUNE_SSE_MOVCC_USE_BLENDV):
New tune.

(cherry picked from commit 9c8cea8feb6cd54ef73113a0b74f1df7b60d09dc)

Daily bump.

testsuite: fix PR111613 test

PR ipa/111613
* gcc.c-torture/pr111613.c: Rename to..
* gcc.c-torture/execute/pr111613.c: ...this.

(cherry picked from commit 5e5d7a88932b132437069f716160f8b20862890b)

tree-optimization/117041 - fix load classification of former grouped load

When we first detect a grouped load but later dis-associate it we
only set DR_GROUP_FIRST_ELEMENT to NULL, indicating it is not a
STMT_VINFO_GROUPED_ACCESS but leave DR_GROUP_NEXT_ELEMENT set. This
causes a stray DR_GROUP_NEXT_ELEMENT access in get_group_load_store_type
to go wrong, indicating a load isn't single_element_p when it actually
is, leading to wrong classification and an ICE.

PR tree-optimization/117041
* tree-vect-stmts.cc (get_group_load_store_type): Only
check DR_GROUP_NEXT_ELEMENT for STMT_VINFO_GROUPED_ACCESS.

* gcc.dg/torture/pr117041.c: New testcase.

(cherry picked from commit 72c83f644dea755b4eba427aabde45f5d3694d9b)

middle-end/117086 - fixup vec_cond simplifications

The following adds missing checks for a vector type result type
to simplifications that end up creating a vec_cond.

PR middle-end/117086
* match.pd ((op (vec_cond ...) ..) -> (vec_cond ...)): Add
missing checks for VECTOR_TYPE_P (type).

* gcc.dg/torture/pr117086.c: New testcase.

(cherry picked from commit c64ae8377210bde44714d265311ee7bfa2733df9)

tree-optimization/116990 - missed control flow check in vect_analyze_loop_form

The following fixes checking for unsupported control flow in
vectorization to also cover the outer loop body.

PR tree-optimization/116990
* tree-vect-loop.cc (vect_analyze_loop_form): Check the current
loop body for control flow.

(cherry picked from commit b0b71618157ddac52266909978f331406f98f3a2)

tree-optimization/116879 - failure to recognize non-empty latch

When we relaxed the vectorizers constraint on loop structure verifying
the emptiness of the latch became too lose as can be seen in the case
for PR116879 where the latch effectively contains two basic-blocks
which one being an unmerged forwarder that's not empty.

PR tree-optimization/116879
* tree-vect-loop.cc (vect_analyze_loop_form): Scan all
blocks that form the latch.

* gcc.dg/pr116879.c: New testcase.

(cherry picked from commit 18e905b461a7138185cf4f0efde4a4e1214fb798)