git.ipfire.org Git - thirdparty/gcc.git/log

GCN, nvptx libstdc++: Force use of '__atomic' builtins [PR119645]

For both GCN, nvptx, this gets rid of 'configure'-time:

    configure: WARNING: No native atomic operations are provided for this platform.
    configure: WARNING: They will be faked using a mutex.
    configure: WARNING: Performance of certain classes will degrade as a result.

..., and changes:

    -checking for lock policy for shared_ptr reference counts... mutex
    +checking for lock policy for shared_ptr reference counts... atomic

That means, '[...]/[target]/libstdc++-v3/', 'Makefile's change:

    -ATOMICITY_SRCDIR = config/cpu/generic/atomicity_mutex
    +ATOMICITY_SRCDIR = config/cpu/generic/atomicity_builtins

..., and '[...]/[target]/libstdc++-v3/config.h' changes:

    /* Defined if shared_ptr reference counting should use atomic operations. */
    -/* #undef HAVE_ATOMIC_LOCK_POLICY */
    +#define HAVE_ATOMIC_LOCK_POLICY 1

    /* Define if the compiler supports C++11 atomics. */
    -/* #undef _GLIBCXX_ATOMIC_BUILTINS */
    +#define _GLIBCXX_ATOMIC_BUILTINS 1

..., and '[...]/[target]/libstdc++-v3/include/[target]/bits/c++config.h'
changes:

    /* Defined if shared_ptr reference counting should use atomic operations. */
    -/* #undef _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY */
    +#define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1

    /* Define if the compiler supports C++11 atomics. */
    -/* #undef _GLIBCXX_ATOMIC_BUILTINS */
    +#define _GLIBCXX_ATOMIC_BUILTINS 1

This means that '[...]/[target]/libstdc++-v3/libsupc++/atomicity.cc',
'[...]/[target]/libstdc++-v3/libsupc++/atomicity.o' then uses atomic
instructions for synchronization instead of C++ static local variables, which
in turn for their guard variables, via 'libstdc++-v3/libsupc++/guard.cc', used
'libgcc/gthr.h' recursive mutexes, which currently are unsupported for GCN.

For GCN, this turns ~500 libstdc++ execution test FAILs into PASSes, and also
progresses:

    PASS: g++.dg/tree-ssa/pr20458.C  -std=gnu++17 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++17 execution test
    PASS: g++.dg/tree-ssa/pr20458.C  -std=gnu++26 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++26 execution test
    UNSUPPORTED: g++.dg/tree-ssa/pr20458.C  -std=gnu++98: exception handling not supported

(For nvptx, there is no effective change, due to other misconfiguration.)

PR target/119645
libstdc++-v3/
* acinclude.m4 (GLIBCXX_ENABLE_LOCK_POLICY) [GCN, nvptx]:
Hard-code results.
* configure: Regenerate.
* configure.host [GCN, nvptx] (atomicity_dir): Set to
'cpu/generic/atomicity_builtins'.

(cherry picked from commit 059b5509c14904b55c37f659170240ae0d2c1c8e)

nvptx: Support '-mfake-ptx-alloca': defer failure to run-time 'alloca' usage

Follow-up to commit 1146410c0feb0e82c689b1333fdf530a2b34dc2b
"nvptx: Support '-mfake-ptx-alloca'".  '-mfake-ptx-alloca' is applicable only
for configurations where PTX 'alloca' is not supported, where target libraries
are built with it enabled (that is, libstdc++, libgfortran).

This change progresses:

    [-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
    [-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++26 (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++26 [-compilation failed to produce executable-]{+execution test+}
    UNSUPPORTED: g++.dg/tree-ssa/pr20458.C  -std=gnu++98: exception handling not supported

..., and "enables" a few test cases:

    FAIL: g++.old-deja/g++.other/sibcall1.C  -std=gnu++17 (test for excess errors)
    [Etc.]

    FAIL: g++.old-deja/g++.other/unchanging1.C  -std=gnu++17 (test for excess errors)
    [Etc.]

..., which now (unrelatedly to 'alloca', and in the same way as configurations
where PTX 'alloca' is supported) FAIL due to:

    unresolved symbol _Unwind_DeleteException
    collect2: error: ld returned 1 exit status

Most importantly, it progresses ~830 libstdc++ test cases:

    [-FAIL:-]{+PASS:+} [...] (test for excess errors)

..., with (if applicable, for most of them):

    [-UNRESOLVED:-]{+PASS:+} [...] [-compilation failed to produce executable-]{+execution test+}

..., or just a few 'FAIL: [...] execution test' where these test cases also
FAIL in configurations where PTX 'alloca' is supported, or ~120 instances of
'FAIL: [...]  execution test' due to run-time
'GCC/nvptx: sorry, unimplemented: dynamic stack allocation not supported'.

This change also resolves the cases noted in
commit bac2d8a246892334e24dfa7d62be0cd0648c5606
"nvptx: Build libgfortran with '-mfake-ptx-alloca' [PR107635]":

| With '-mfake-ptx-alloca', libgfortran again succeeds to build, and compared
| to before, we've got only a small number of regressions due to nvptx 'ld'
| complaining about 'unresolved symbol __GCC_nvptx__PTX_alloca_not_supported':
|
|     [-PASS:-]{+FAIL:+} gfortran.dg/coarray/codimension_2.f90 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)

    [-FAIL:-]{+PASS:+} gfortran.dg/coarray/codimension_2.f90 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)

|     [-PASS:-]{+FAIL:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)
|     [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib  -O2  -lcaf_single [-execution test-]{+compilation failed to produce executable+}

    [-FAIL:-]{+PASS:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib  -O2  -lcaf_single [-compilation failed to produce executable-]{+execution test+}

|     [-PASS:-]{+FAIL:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)
|     [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib  -O2  -lcaf_single [-execution test-]{+compilation failed to produce executable+}

    [-FAIL:-]{+PASS:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib  -O2  -lcaf_single [-compilation failed to produce executable-]{+execution test+}

|     [-PASS:-]{+FAIL:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)
|     [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib  -O2  -lcaf_single [-execution test-]{+compilation failed to produce executable+}

    [-FAIL:-]{+PASS:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib  -O2  -lcaf_single (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib  -O2  -lcaf_single [-compilation failed to produce executable-]{+execution test+}

|     [-PASS:-]{+FAIL:+} gfortran.dg/coarray_43.f90   -O  (test for excess errors)

    [-FAIL:-]{+PASS:+} gfortran.dg/coarray_43.f90   -O  (test for excess errors)

..., and further progresses:

    [-FAIL:-]{+PASS:+} gfortran.dg/coarray_lib_comm_1.f90   -O0  (test for excess errors)
    [-UNRESOLVED:-]{+FAIL:+} gfortran.dg/coarray_lib_comm_1.f90   -O0  [-compilation failed to produce executable-]{+execution test+}
    [Etc.]

..., which now (unrelatedly to 'alloca', and in the same way as configurations
where PTX 'alloca' is supported) FAILs due to:

    error   : Prototype doesn't match for '_gfortran_caf_transfer_between_remotes' in 'input file 9 at offset 159897', first defined in 'input file 9 at offset 159897'
    error   : Prototype doesn't match for '_gfortran_caf_stop_numeric' in 'input file 9 at offset 159897', first defined in 'input file 9 at offset 159897'
    nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)

gcc/
* config/nvptx/nvptx.opt (-mfake-ptx-alloca): Update.
gcc/testsuite/
* gcc.target/nvptx/alloca-2-O0_-mfake-ptx-alloca.c: Adjust.
libgcc/
* config/nvptx/alloca.c: New.
* config/nvptx/t-nvptx (LIB2ADD): Add it.

(cherry picked from commit 199f1abeef579912b4c40c42519825cedca6530f)

libstdc++, nvptx: Remove machinery to inject per-file flags

Not used anymore.

libstdc++-v3/
* config/cpu/nvptx/t-nvptx: Remove.
* configure.host [nvptx]: Adjust.

(cherry picked from commit 287f360b3e75a19c48ee14c71f51b6e7968474ef)

nvptx: Don't use PTX '.const', constant state space [PR119573]

This avoids cases where a "File uses too much global constant data" (final
executable, or single object file), and avoids cases of wrong code generation:
"error : State space incorrect for instruction 'st'" ('st.const'), or another
case where an "illegal instruction was encountered", or a lot of cases where
for two compilation units (such as a library linked with user code) we ran into
"error : Memory space doesn't match" due to differences in '.const' usage
between definition and use of a variable.

We progress:

    ptxas error   : File uses too much global constant data (0x1f01a bytes, 0x10000 max)
    nvptx-run: cuLinkAddData failed: a PTX JIT compilation failed (CUDA_ERROR_INVALID_PTX, 218)

... into:

    PASS: 20_util/to_chars/103955.cc  -std=gnu++17 (test for excess errors)
    [-FAIL:-]{+PASS:+} 20_util/to_chars/103955.cc  -std=gnu++17 execution test

We progress:

    ptxas error   : File uses too much global constant data (0x36c65 bytes, 0x10000 max)
    nvptx-as: ptxas returned 255 exit status

... into:

    [-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c   -O0  {+(test for excess errors)+}
    [-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c   -O1  {+(test for excess errors)+}
    [-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c   -O2  {+(test for excess errors)+}
    [-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c   -O3 -g  {+(test for excess errors)+}
    [-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c   -Os  {+(test for excess errors)+}

    [-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C   -O0  (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C   -O1  (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C   -O2  (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C   -O3 -g  (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C   -Os  (test for excess errors)

    [-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-1.f90   -O0  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-1.f90   -O0  [-compilation failed to produce executable-]{+execution test+}

    [-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-4.f90   -O0  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-4.f90   -O0  [-compilation failed to produce executable-]{+execution test+}

    [-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-5.f90   -O0  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-5.f90   -O0  [-compilation failed to produce executable-]{+execution test+}

    [-FAIL:-]{+PASS:+} 20_util/to_chars/double.cc  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} 20_util/to_chars/double.cc  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}

    [-FAIL:-]{+PASS:+} 20_util/to_chars/float.cc  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} 20_util/to_chars/float.cc  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}

    [-FAIL:-]{+PASS:+} special_functions/13_ellint_3/check_value.cc  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} special_functions/13_ellint_3/check_value.cc  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}

    [-FAIL:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/14_ellint_3/check_value.cc  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/14_ellint_3/check_value.cc  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}

..., and progress likewise, but fail later with an unrelated error:

    [-FAIL:-]{+PASS:+} ext/special_functions/hyperg/check_value.cc  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+FAIL:+} ext/special_functions/hyperg/check_value.cc  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}

    [...]/libstdc++-v3/testsuite/ext/special_functions/hyperg/check_value.cc:12317: void test(const testcase_hyperg<Ret> (&)[Num], Ret) [with Ret = double; unsigned int Num = 19]: Assertion 'max_abs_frac < toler' failed.

..., and:

    [-FAIL:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc  -std=gnu++17 (test for excess errors)
    [-UNRESOLVED:-]{+FAIL:+} tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc  -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}

    [...]/libstdc++-v3/testsuite/tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc:12316: void test(const testcase_hyperg<Ret> (&)[Num], Ret) [with Ret = double; unsigned int Num = 19]: Assertion 'max_abs_frac < toler' failed.

We progress:

    nvptx-run: error getting kernel result: an illegal instruction was encountered (CUDA_ERROR_ILLEGAL_INSTRUCTION, 715)

... into:

    PASS: g++.dg/cpp1z/inline-var1.C  -std=gnu++17 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C  -std=gnu++17 execution test
    PASS: g++.dg/cpp1z/inline-var1.C  -std=gnu++20 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C  -std=gnu++20 execution test
    PASS: g++.dg/cpp1z/inline-var1.C  -std=gnu++26 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C  -std=gnu++26 execution test

(A lot of '.const' -> '.global' etc.  Haven't researched what the actual
problem was.)

We progress:

    ptxas /tmp/cc5TSZZp.o, line 142; error   : State space incorrect for instruction 'st'
    ptxas /tmp/cc5TSZZp.o, line 174; error   : State space incorrect for instruction 'st'
    ptxas fatal   : Ptx assembly aborted due to errors
    nvptx-as: ptxas returned 255 exit status

... into:

    [-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -O0  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -O0  [-compilation failed to produce executable-]{+execution test+}
    PASS: g++.dg/torture/builtin-clear-padding-1.C   -O1  (test for excess errors)
    PASS: g++.dg/torture/builtin-clear-padding-1.C   -O1  execution test
    [-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -O2  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -O2  [-compilation failed to produce executable-]{+execution test+}
    [-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -O3 -g  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -O3 -g  [-compilation failed to produce executable-]{+execution test+}
    [-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -Os  (test for excess errors)
    [-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C   -Os  [-compilation failed to produce executable-]{+execution test+}

This indeed tried to write ('st.const') into 's2', which was '.const'
(also: 's1' was '.const') -- even though, no explicit 'const' in
'g++.dg/torture/builtin-clear-padding-1.C'; "interesting".

We progress:

    error   : Memory space doesn't match for '_ZNSt3tr18__detail12__prime_listE' in 'input file 3 at offset 53085', first specified in 'input file 1 at offset 1924'
    nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)

... into execution test PASS for a few dozens of libstdc++ test cases.

We progress:

    error   : Memory space doesn't match for '_ZNSt6locale17_S_twinned_facetsE' in 'input file 11 at offset 479903', first specified in 'input file 9 at offset 59300'
    nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)

... into:

    PASS: g++.dg/tree-ssa/pr20458.C  -std=gnu++17 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++17 execution test
    PASS: g++.dg/tree-ssa/pr20458.C  -std=gnu++26 (test for excess errors)
    [-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C  -std=gnu++26 execution test

..., and likewise for a few hundreds of libstdc++ test cases.

We progress:

    error   : Memory space doesn't match for '_ZNSt6locale5_Impl19_S_facet_categoriesE' in 'input file 11 at offset 821962', first specified in 'input file 10 at offset 676317'
    nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)

... into execution test PASS for a hundred of libstdc++ test cases.

We progress:

    error   : Memory space doesn't match for '_ctype_' in 'input file 22 at offset 1698331', first specified in 'input file 9 at offset 57095'
    nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)

... into execution test PASS for another few libstdc++ test cases.

PR target/119573
gcc/
* config/nvptx/nvptx.cc (nvptx_encode_section_info): Don't set
'DATA_AREA_CONST' for 'TREE_CONSTANT', or 'TREE_READONLY'.
(nvptx_asm_declare_constant_name): Use '.global' instead of
'.const'.
gcc/testsuite/
* gcc.c-torture/compile/pr46534.c: Don't 'dg-skip-if' nvptx.
* gcc.target/nvptx/decl.c: Adjust.
libstdc++-v3/
* config/cpu/nvptx/t-nvptx (AM_MAKEFLAGS): Don't amend.

(cherry picked from commit 5deeae29dab2af64e3342daf7a30000e424c64ea)

nvptx: In offloading compilation, special-case certain host-setup symbol aliases: avoid unused label 'emit_ptx_alias' diagnostic

Minor fix-up for commit 65b31b3fff2fced015ded1026733605f34053796
"nvptx: In offloading compilation, special-case certain host-setup symbol aliases [PR101544]",
as of which we see for non-offloading configurations:

    +[...]/source-gcc/gcc/config/nvptx/nvptx.cc: In function 'void nvptx_asm_output_def_from_decls(FILE*, tree, tree)':
    +[...]/source-gcc/gcc/config/nvptx/nvptx.cc:7769:2: warning: label 'emit_ptx_alias' defined but not used [-Wunused-label]
    + 7769 |  emit_ptx_alias:
    +      |  ^~~~~~~~~~~~~~

gcc/
* config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls)
[!ACCEL_COMPILER]: Don't define label 'emit_ptx_alias'.

(cherry picked from commit 175016de6f9d800343ce31cf1837a3265569b657)

OpenACC 2.7: adjust 2.6 references to 2.7

More adjustments to indicate OpenACC 2.7 support.

2025-04-11 Chung-Lin Tang <cltang@baylibre.com>

gcc/fortran/ChangeLog:

* intrinsic.texi (OpenACC Module OPENACC): Adjust version
references to 2.7 from 2.6.

libgomp/ChangeLog:

* libgomp.texi (Enabling OpenACC): Adjust version
references to 2.7 from 2.6.
* openacc.f90 (module openacc): Adjust openacc_version to 201811.
* openacc_lib.h (openacc_version): Adjust openacc_version to 201811.
* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Adjust
test value to 201811.
* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.

OpenACC 2.7: update _OPENACC value test in testcases

Adjust value test of _OPENACC to 201811 for OpenACC 2.7 update.

2025-04-11 Chung-Lin Tang <cltang@baylibre.com>

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/openacc-define-3.c: Adjust test.
* gfortran.dg/openacc-define-3.f90: Adjust test.

OpenACC 2.7: update _OPENACC symbol to 201811

This patch updates the _OPENACC preprocessor symbol to "201811",
to indicate OpenACC 2.7 support.

2025-04-11 Chung-Lin Tang <cltang@baylibre.com>

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Updated _OPENACC to "201811"
for OpenACC 2.7.

gcc/fortran/ChangeLog:

* cpp.cc (cpp_define_builtins): Updated _OPENACC to "201811"
for OpenACC 2.7.

Merge remote-tracking branch 'origin/releases/gcc-14' into devel/omp/gcc-14

Merge up to r14-11542-g059107eb22c480 (8th Apr 2025)

OpenMP: Fix append_args handling in modify_call_for_omp_dispatch

At tree level, the addr ref is also required for array dummy arguments,
contrary to C; the GOMP_interop calls in modify_call_for_omp_dispatch
were updated accordingly (using build_fold_addr_expr).

As the GOMP_interop calls had no location data associated with them,
the init call happened as soon as executing the previous line of code,
which was confusing; solution: use the location data of the function
call itself.

PR middle-end/119662

gcc/ChangeLog:

* gimplify.cc (modify_call_for_omp_dispatch): Fix GOMP_interop
arg passing; add location info to function calls.

libgomp/ChangeLog:

* testsuite/libgomp.c/append-args-fr-1.c: New test.
* testsuite/libgomp.c/append-args-fr.h: New test.

gcc/testsuite/ChangeLog:
* c-c++-common/gomp/append-args-interop.c: Update for fixed
GOMP_interop call.
* g++.dg/gomp/append-args-8.C: Likewise.
* gfortran.dg/gomp/append-args-interop.f90: Likewise.

(cherry picked from commit 0f77d88fdf797842ac0134a4013b4227dd5a658f)

libstdc++: Fix use-after-free in std::format [PR119671]

When formatting floating-point values to wide strings there's a case
where we invalidate a std::wstring buffer while a std::wstring_view is
still referring to it.

libstdc++-v3/ChangeLog:

PR libstdc++/119671
* include/std/format (__formatter_fp::format): Do not invalidate
__wstr unless _M_localized returns a valid string.
* testsuite/std/format/functions/format.cc: Check wide string
formatting of floating-point types with classic locale.

Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com>
(cherry picked from commit e33b62eed7fd0a82d758b23252d288585b6790d2)

libstdc++: Add new header to Doxygen config file

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (INPUT): Add text_encoding.

(cherry picked from commit 5430fcd1a3222d62c1b9560de251268c8bc50303)

libstdc++: Replace use of __mindist in ranges::uninitialized_xxx algos [PR101587]

In r15-8980-gf4b6acfc36fb1f I introduced a new function object for
finding the smaller of two distances. In bugzilla Hewill Kang pointed
out that we still need to explicitly convert the result back to the
right difference type, because the result might be an integer-like class
type that doesn't convert to an integral type explicitly.

Rather than doing that conversion in the __mindist function object, I
think it's simpler to remove it again and just do a comparison and
assignment. We always want the result to have a specific type, so we can
just check if the value of the other type is smaller, and then convert
that to the other type if so.

libstdc++-v3/ChangeLog:

PR libstdc++/101587
* include/bits/ranges_uninitialized.h (__detail::__mindist):
Remove.
(ranges::uninitialized_copy, ranges::uninitialized_copy_n)
(ranges::uninitialized_move, ranges::uninitialized_move_n): Use
comparison and assignment instead of __mindist.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc:
Check with ranges that use integer-like class type for
difference type.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.

Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com>
Reviewed-by: Hewill Kang <hewillk@gmail.com>
(cherry picked from commit 03ac8886e5c1fa16da90276fd721a57fa9435f4f)

libstdc++: Replace use of std::min in ranges::uninitialized_xxx algos [PR101587]

Because ranges can have any signed integer-like type as difference_type,
it's not valid to use std::min(diff1, diff2). Instead of calling
std::min with an explicit template argument, this adds a new __mindist
helper that determines the common type and uses that with std::min.

libstdc++-v3/ChangeLog:

PR libstdc++/101587
* include/bits/ranges_uninitialized.h (__detail::__mindist):
New function object.
(ranges::uninitialized_copy, ranges::uninitialized_copy_n)
(ranges::uninitialized_move, ranges::uninitialized_move_n): Use
__mindist instead of std::min.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc:
Check ranges with difference difference types.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.

(cherry picked from commit f4b6acfc36fb1f72fdd5bf4da208515e6495a062)

LoongArch: Add LoongArch architecture detection to __float128 support in libgfortran and libquadmath [PR119408].

In GCC14, LoongArch added __float128 as an alias for _Float128.
In commit r15-8962, support for q/Q suffixes for 128-bit floating point
numbers.  This will cause the compiler to automatically link libquadmath
when compiling Fortran programs.  But on LoongArch `long double` is
IEEE quad, so there is no need to implement libquadmath.
This causes link failure.

PR target/119408

libgfortran/ChangeLog:

* acinclude.m4: When checking for __float128 support, determine
whether the current architecture is LoongArch.  If so, return false.
* configure: Regenerate.

libquadmath/ChangeLog:

* configure.ac: When checking for __float128 support, determine
whether the current architecture is LoongArch.  If so, return false.
* configure: Regenerate.

Sigend-off-by: Xi Ruoyao <xry111@xry111.site>
Sigend-off-by: Jakub Jelinek <jakub@redhat.com>
(cherry picked from commit 1534f0099c98ea14c08a401302b05edf2231f411)

Daily bump.

libstdc++: Work around C++20 tuple<tuple<any>> constraint recursion [PR116440]

The type tuple<tuple<any>> is clearly copy/move constructible, but for
reasons that are not yet completely understood checking this triggers
constraint recursion with our C++20 tuple implementation (but not the
C++17 implementation).

It turns out this recursion stems from considering the non-template
tuple(const _Elements&) constructor during the copy/move constructibility
check.  Considering this constructor is ultimately redundant, since the
defaulted copy/move constructors are better matches.

GCC has a non-standard "perfect candidate" optimization[1] that causes
overload resolution to shortcut considering template candidates if we
find a (non-template) perfect candidate.  So to work around this issue
(and as a general compile-time optimization) this patch turns the
problematic constructor into a template so that GCC doesn't consider it
when checking for copy/move constructibility of this tuple type.

Changing the template-ness of a constructor can affect overload
resolution (since template-ness is a tiebreaker) so there's a risk this
change could e.g. introduce overload resolution ambiguities.  But the
original C++17 implementation has long defined this constructor as a
template (in order to constrain it etc), so doing the same thing in the
C++20 mode should naturally be quite safe.

The testcase still fails with Clang (in C++20 mode) since it doesn't
implement said optimization.

[1]: See r11-7287-g187d0d5871b1fa and
https://isocpp.org/files/papers/P3606R0.html

PR libstdc++/116440

libstdc++-v3/ChangeLog:

* include/std/tuple (tuple::tuple(const _Elements&...))
[C++20]: Turn into a template.
* testsuite/20_util/tuple/116440.C: New test.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 6570fa6f2612a4e4ddd2fcfc119369a1a48656e4)

c++: constinit and value-initialization [PR119652]

Value-initialization built an AGGR_INIT_EXPR to set AGGR_INIT_ZERO_FIRST on.
Passing that AGGR_INIT_EXPR to maybe_constant_value returned a TARGET_EXPR,
which potential_constant_expression_1 mistook for a temporary.

We shouldn't add a TARGET_EXPR to the AGGR_INIT_EXPR in this case, just like
we already avoid adding it to CONSTRUCTOR or CALL_EXPR.

PR c++/119652

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Also don't add a
TARGET_EXPR around AGGR_INIT_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit20.C: New test.

(cherry picked from commit c7dc9b6f889fa8f9e4ef060c3af107eaf54265c5)

c++: __FUNCTION__ in lambda return type [PR118629]

In this testcase, the use of __FUNCTION__ is within a function parameter
scope, the lambda's. And P1787 changed __func__ to live in the parameter
scope. But [basic.scope.pdecl] says that the point of declaration of
__func__ is immediately before {, so in the trailing return type it isn't in
scope yet, so this __FUNCTION__ should refer to foo().

Looking first for a block scope, then a function parameter scope, gives us
the right result.

PR c++/118629

gcc/cp/ChangeLog:

* name-lookup.cc (pushdecl_outermost_localscope): Look for an
sk_block.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-__func__3.C: New test.

(cherry picked from commit 7d561820525fd3b9d8f3876333c0584d75e7c053)

Daily bump.

c++: lambda in requires outside template [PR99546]

Since r10-7441 we set processing_template_decl in a requires-expression so
that we can use tsubst_expr to evaluate the requirements, but that confuses
lambdas terribly; begin_lambda_type silently returns error_mark_node and we
continue into other failures. This patch clears processing_template_decl
again while we're defining the closure and op() function, so it only remains
set while parsing the introducer (i.e. any init-captures) and building the
resulting object. This properly avoids trying to create another lambda in
tsubst_lambda_expr.

PR c++/99546
PR c++/113925
PR c++/106976
PR c++/109961
PR c++/117336

gcc/cp/ChangeLog:

* lambda.cc (build_lambda_object): Handle fake
requires-expr processing_template_decl.
* parser.cc (cp_parser_lambda_expression): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-requires2.C: New test.
* g++.dg/cpp2a/lambda-requires3.C: New test.
* g++.dg/cpp2a/lambda-requires4.C: New test.
* g++.dg/cpp2a/lambda-requires5.C: New test.

(cherry picked from commit 25992d8daff60726a247ec7850d540aed5335639)

c++: constraint variable used in evaluated context [PR117849]

Here we wrongly reject the type-requirement at parse time due to its use
of the constraint variable 't' within a template argument (an evaluated
context). Fix this simply by refining the "use of parameter outside
function body" error path to exclude constraint variables.

PR c++/104255 tracks the same issue for function parameters, but fixing
that would be more involved, requiring changes to the PARM_DECL case of
tsubst_expr.

PR c++/117849

gcc/cp/ChangeLog:

* semantics.cc (finish_id_expression_1): Allow use of constraint
variable outside an unevaluated context.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires41.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 6e973e87e3fec6f33e97edf8fce2fcd121e53961)

c++: P2280R4 and speculative constexpr folding [PR119387]

Compiling the testcase in this PR uses 2.5x more memory and 6x more
time ever since r14-5979 which implements P2280R4.  This is because
our speculative constexpr folding now does a lot more work trying to
fold ultimately non-constant calls to constexpr functions, and in turn
produces a lot of garbage.  We do sometimes successfully fold more
thanks to P2280R4, but it seems to be trivial stuff like calls to
std::array::size or std::addressof.  The benefit of P2280 therefore
doesn't seem worth the cost during speculative constexpr folding, so
this patch restricts the paper to only manifestly-constant evaluation.

PR c++/119387

gcc/cp/ChangeLog:

* constexpr.cc (p2280_active_p): New.
(cxx_eval_constant_expression) <case VAR_DECL>: Use it to
restrict P2280 relaxations.
<case PARM_DECL>: Likewise.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit a926345f22b500a2620adb83e6821e01fb8cc8fd)

Ada: Fix thinko in Eigensystem for complex Hermitian matrices

The implementation solves the eigensystem for a NxN complex Hermitian matrix
by first solving it for a 2Nx2N real symmetric matrix and then interpreting
the 2Nx1 real vectors as Nx1 complex ones, but the last step does not work.

The patch fixes the last step and also performs a small cleanup throughout
the implementation, mostly in the commentary and without functional changes.

gcc/ada/
* libgnat/a-ngcoar.adb (Eigensystem): Adjust notation and fix the
layout of the real symmetric matrix in the main comment. Adjust
the layout of the associated code accordingly and correctly turn
the 2Nx1 real vectors into Nx1 complex ones.
(Eigenvalues): Minor similar tweaks.
* libgnat/a-ngrear.adb (Jacobi): Minor tweaks in the main comment.
Adjust notation and corresponding parameter names of functions.
Fix call to Unit_Matrix routine. Adjust the comment describing
the various kinds of iterations to match the implementation.

vect: Relax scan-tree-dump strict pattern matching [PR118597]

Using specific SSA names in pattern matching in `dg-final' makes tests
"unstable", in that changes in passes prior to the pass whose dump is
analyzed in the particular test may change the numbering of the SSA
variables, causing the test to start failing spuriously.

We thus switch from specific SSA names to the use of a multi-line
regular expression making use of capture groups for matching particular
variables across different statements, ensuring the test will pass
more consistently across different versions of GCC.

PR testsuite/118597

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fncall-mask.c: Update test directives.

Daily bump.

OpenMP: Require target and/or targetsync init modifier [PR118965]

As noted in PR 118965, the initial interop implementation overlooked
the requirement in the OpenMP spec that at least one of the "target"
and "targetsync" modifiers is required in both the interop construct
init clause and the declare variant append_args clause.

Adding the check was fairly straightforward, but it broke about a
gazillion existing test cases.  In particular, things like "init (x, y)"
which were previously accepted (and tested for being accepted) aren't
supposed to be allowed by the spec, much less things like "init (target)"
where target was previously interpreted as a variable name instead of a
modifier.  Since one of the effects of the change is that at least one
modifier is always required, I found that deleting all the code that was
trying to detect and handle the no-modifier case allowed for better
diagnostics.

gcc/c/ChangeLog
PR middle-end/118965
* c-parser.cc (c_parser_omp_clause_init_modifiers): Adjust
error message.
(c_parser_omp_clause_init): Remove code for recognizing clauses
without modifiers.  Diagnose missing target/targetsync modifier.
(c_finish_omp_declare_variant): Diagnose missing target/targetsync
modifier.

gcc/cp/ChangeLog
PR middle-end/118965
* parser.cc (c_parser_omp_clause_init_modifiers): Adjust
error message.
(cp_parser_omp_clause_init): Remove code for recognizing clauses
without modifiers.  Diagnose missing target/targetsync modifier.
(cp_finish_omp_declare_variant): Diagnose missing target/targetsync
modifier.

gcc/fortran/ChangeLog
PR middle-end/118965
* openmp.cc (gfc_parser_omp_clause_init_modifiers): Fix some
inconsistent code indentation.  Remove code for recognizing
clauses without modifiers.  Diagnose prefer_type without a
following paren.  Adjust error message for an unrecognized modifier.
Diagnose missing target/targetsync modifier.
(gfc_match_omp_init): Fix more inconsistent code indentation.

gcc/testsuite/ChangeLog
PR middle-end/118965
* c-c++-common/gomp/append-args-1.c: Add target/targetsync
modifiers so tests do what they were previously supposed to do.
Adjust expected output.
* c-c++-common/gomp/append-args-7.c: Likewise.
* c-c++-common/gomp/append-args-8.c: Likewise.
* c-c++-common/gomp/append-args-9.c: Likewise.
* c-c++-common/gomp/interop-1.c: Likewise.
* c-c++-common/gomp/interop-2.c: Likewise.
* c-c++-common/gomp/interop-3.c: Likewise.
* c-c++-common/gomp/interop-4.c: Likewise.
* c-c++-common/gomp/pr118965-1.c: New.
* c-c++-common/gomp/pr118965-2.c: New.
* g++.dg/gomp/append-args-1.C: Add target/targetsync modifiers
and adjust expected output.
* g++.dg/gomp/append-args-2.C: Likewise.
* g++.dg/gomp/append-args-6.C: Likewise.
* g++.dg/gomp/append-args-7.C: Likewise.
* g++.dg/gomp/append-args-8.C: Likewise.
* g++.dg/gomp/interop-5.C: Likewise.
* gfortran.dg/gomp/append_args-1.f90: Add target/targetsync
modifiers and adjust expected output.
* gfortran.dg/gomp/append_args-2.f90: Likewise.
* gfortran.dg/gomp/append_args-3.f90: Likewise.
* gfortran.dg/gomp/append_args-4.f90: Likewise.
* gfortran.dg/gomp/interop-1.f90: Likewise.
* gfortran.dg/gomp/interop-2.f90: Likewise.
* gfortran.dg/gomp/interop-3.f90: Likewise.
* gfortran.dg/gomp/interop-4.f90: Likewise.
* gfortran.dg/gomp/pr118965-1.f90: New.
* gfortran.dg/gomp/pr118965-2.f90: New.

(cherry picked from commit aca8155c09001f269a20d6df438fa0e749dd5388)

libstdc++: Restored accidentally removed test case.

It was removed by accident r14-11523-gad1b71fc2882c1.

libstdc++-v3/ChangeLog:

* testsuite/std/format/functions/format.cc: Restored line.

(cherry picked from commit 81c990aa84b22562157ce2926577b392b4a129d3)

libstdc++: Fix handling of field width for wide strings and characters [PR119593]

This patch corrects handling of UTF-32LE and UTF32-BE in
__unicode::__literal_encoding_is_unicode<_CharT>, so they are
recognized as unicode and functions produces correct result for wchar_t.

Use `__unicode::__field_width` to compute the estimated witdh
of the charcter for unicode wide encoding.

PR libstdc++/119593

libstdc++-v3/ChangeLog:

* include/bits/unicode.h
(__unicode::__literal_encoding_is_unicode<_CharT>):
Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
* include/std/format (__formatter_str::_S_character_width):
Define.
(__formatter_str::_S_character_width): Updated passed char
length.
* testsuite/std/format/functions/format.cc: Test for wchar_t.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>

Fortran: Fix freeing procedure pointer components [PR119380]

Backported from gcc-15.

PR fortran/119380

gcc/fortran/ChangeLog:

* trans-array.cc (structure_alloc_comps): Prevent freeing of
procedure pointer components.

gcc/testsuite/ChangeLog:

* gfortran.dg/proc_ptr_comp_54.f90: New test.

Daily bump.

tree-optimization/119145 - avoid stray .MASK_CALL after vectorization

When we BB vectorize an if-converted loop body we make sure to not
leave around .MASK_LOAD or .MASK_STORE created by if-conversion but
we failed to check for .MASK_CALL.

PR tree-optimization/119145
* tree-vectorizer.cc (try_vectorize_loop_1): Avoid BB
vectorizing an if-converted loop body when there's a .MASK_CALL
in the loop body.

* gcc.dg/vect/pr119145.c: New testcase.

(cherry picked from commit 7950d4cceb9fc7559b1343c95fc651cefbe287a0)

middle-end/119119 - re-gimplification of empty CTOR assignments

The following testcase runs into a re-gimplification issue during
inlining when processing

  MEM[(struct e *)this_2(D)].a = {};

where re-gimplification does not handle assignments in the same
way than the gimplifier but instead relies on rhs_predicate_for
and gimplifying the RHS standalone.  This fails to handle
special-casing of CTORs.  The is_gimple_mem_rhs_or_call predicate
already handles clobbers but not empty CTORs so we end up in
the fallback code trying to force the CTOR into a separate stmt
using a temporary - but as we have a non-copyable type here that ICEs.

The following generalizes empty CTORs in is_gimple_mem_rhs_or_call
since those need no additional re-gimplification.

PR middle-end/119119
* gimplify.cc (is_gimple_mem_rhs_or_call): All empty CTORs
are OK when not a register type.

* g++.dg/torture/pr11911.C: New testcase.

(cherry picked from commit 3bd61c1dfaa2d7153eb4be82f423533ea937d0f9)

tree-optimization/119096 - bogus conditional reduction vectorization

When we vectorize a .COND_ADD reduction and apply the single-use-def
cycle optimization we can end up chosing the wrong else value for
subsequent .COND_ADD. The following rectifies this.

PR tree-optimization/119096
* tree-vect-loop.cc (vect_transform_reduction): Use the
correct else value for .COND_fn.

* gcc.dg/vect/pr119096.c: New testcase.

(cherry picked from commit 10e4107dfcf9fe324d0902f16411a75c596dab91)

ipa/119067 - bogus TYPE_PRECISION check on VECTOR_TYPE

odr_types_equivalent_p can end up using TYPE_PRECISION on vector
types which is a no-go. The following instead uses TYPE_VECTOR_SUBPARTS
for vector types so we also end up comparing the number of vector elements.

PR ipa/119067
* ipa-devirt.cc (odr_types_equivalent_p): Check
TYPE_VECTOR_SUBPARTS for vectors.

* g++.dg/lto/pr119067_0.C: New testcase.
* g++.dg/lto/pr119067_1.C: Likewise.

(cherry picked from commit f22e89167b3abfbf6d67f42fc4d689d8ffdc1810)

tree-optimization/119057 - bogus double reduction detection

We are detecting a cycle as double reduction where the inner loop
cycle has extra out-of-loop uses.  This clashes at least with
assumptions from the SLP discovery code which says the cycle
isn't reachable from another SLP instance.  It also was not intended
to support this case, in fact with GCC 14 we seem to generate wrong
code here.

PR tree-optimization/119057
* tree-vect-loop.cc (check_reduction_path): Add argument
specifying whether we're analyzing the inner loop of a
double reduction.  Do not allow extra uses outside of the
double reduction cycle in this case.
(vect_is_simple_reduction): Adjust.

* gcc.dg/vect/pr119057.c: New testcase.

(cherry picked from commit 758de6263dfc7ba8701965fa468691ac23cb7eb5)

lto/114501 - missed free-lang-data for CONSTRUCTOR index

The following makes sure to also walk CONSTRUCTOR element indexes
which can be FIELD_DECLs, referencing otherwise unused types we
need to clean. walk_tree only walks CONSTRUCTOR element data.

PR lto/114501
* ipa-free-lang-data.cc (find_decls_types_r): Explicitly
handle CONSTRUCTORs as walk_tree handling of those is
incomplete.

* g++.dg/pr114501_0.C: New testcase.

(cherry picked from commit fdd95e1cf29137a19baed25f8c817d320dfe63e3)

ipa/111245 - bogus modref analysis for store in call that might throw

We currently record a kill for

*x_4(D) = always_throws ();

because we consider the store always executing since the appropriate
check for whether the stmt could throw is guarded by
!cfun->can_throw_non_call_exceptions.

PR ipa/111245
* ipa-modref.cc (modref_access_analysis::analyze_store): Do
not guard the check of whether the stmt could throw by
cfun->can_throw_non_call_exceptions.

* g++.dg/torture/pr111245.C: New testcase.

(cherry picked from commit e6037af6d5e5a43c437257580d75bc8b35a6dcfd)

aarch64: Use PAUTH instead of V8_3A in some places

PR target/119383

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_expand_epilogue): Use TARGET_PAUTH.
* config/aarch64/aarch64.md: Update comment.

(cherry picked from commit 20385cb92cbd4a1934661ab97a162c1e25935836)

libstdc++: Fix std::ranges::iter_move for function references [PR119469]

The result of std::move (or a cast to an rvalue reference) on a function
reference is always an lvalue. Because std::ranges::iter_move was using
the type std::remove_reference_t<X>&& as the result of std::move, it was
giving the wrong type for function references. Use a decltype-specifier
with declval<remove_reference_t<X>>() instead of just using the
remove_reference_t<X>&& type directly. This gives the right result,
while still avoiding the cost of doing overload resolution for
std::move.

libstdc++-v3/ChangeLog:

PR libstdc++/119469
* include/bits/iterator_concepts.h (_IterMove::__result): Use
decltype-specifier instead of an explicit type.
* testsuite/24_iterators/customization_points/iter_move.cc:
Check results for function references.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 3e52eb28c537aaa03afb78ef9dff8325c5f41f78)

libstdc++: Fix ranges::iter_move handling of rvalues [PR106612]

The specification for std::ranges::iter_move apparently requires us to
handle types which do not satisfy std::indirectly_readable, for example
with overloaded operator* which behaves differently for different value
categories.

libstdc++-v3/ChangeLog:

PR libstdc++/106612
* include/bits/iterator_concepts.h (_IterMove::__iter_ref_t):
New alias template.
(_IterMove::__result): Use __iter_ref_t instead of
std::iter_reference_t.
(_IterMove::__type): Remove incorrect __dereferenceable
constraint.
(_IterMove::operator()): Likewise. Add correct constraints. Use
__iter_ref_t instead of std::iter_reference_t. Forward parameter
as correct value category.
(iter_swap): Add comments.
* testsuite/24_iterators/customization_points/iter_move.cc: Test
that iter_move is found by ADL and that rvalue arguments are
handled correctly.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit a8ee522c5923ba17851e4b71316a2dff19d6368f)

libstdc++: Fix -Warray-bounds warning in std::vector<bool> [PR110498]

In this case, we need to tell the compiler that the current size is not
larger than the new size so that all the existing elements can be copied
to the new storage. This avoids bogus warnings about overflowing the new
storage when the compiler can't tell that that cannot happen.

We might as well also hoist the loads of begin() and end() before the
allocation too. All callers will have loaded at least begin() before
calling _M_reallocate.

libstdc++-v3/ChangeLog:

PR libstdc++/110498
* include/bits/vector.tcc (vector<bool, A>::_M_reallocate):
Hoist loads of begin() and end() before allocation and use them
to state an unreachable condition.
* testsuite/23_containers/vector/bool/capacity/110498.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit aa3aaf2bfb8fcc17076993df4297597b68bc5f60)

libstdc++: Fix -Wstringop-overread warning in std::vector<bool> [PR114758]

As in r13-4393-gcca06f0d6d76b0 and a few other commits, we can avoid
bogus warnings in std::vector<bool> by hoisting some loads to before the
allocation that calls operator new. This means that the compiler has
enough info to remove the dead branches that trigger bogus warnings.

On trunk this is only needed with -fno-assume-sane-operators-new-delete
but it will help on the branches where that option doesn't exist.

libstdc++-v3/ChangeLog:

PR libstdc++/114758
* include/bits/vector.tcc (vector<bool, A>::_M_fill_insert):
Hoist loads of begin() and end() before allocation.
* testsuite/23_containers/vector/bool/capacity/114758.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 1f6c19f307c8de9830130a0ba071c24e3835beb3)

libstdc++: Fix bogus -Wstringop-overflow in std::vector::insert [PR117983]

This was fixed on trunk by r15-4473-g3abe751ea86e34, but that isn't
suitable for backporting. Instead, just add another unreachable
condition in std::vector::_M_range_insert so the compiler knows this
memcpy doesn't use a length originating from a negative ptrdiff_t
converted to a very positive size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/117983
* include/bits/vector.tcc (vector::_M_range_insert): Add
unreachable condition to tell the compiler begin() <= end().
* testsuite/23_containers/vector/modifiers/insert/117983.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 878812b6f6905774ab37cb78903e3e11bf1c508c)

libstdc++: Fix -Warray-bounds warning in std::vector::resize [PR114945]

This is yet another false positive warning fix. This time the compiler
can't prove that when the vector has sufficient excess capacity to
append new elements, the pointer to the existing storage is not null.

libstdc++-v3/ChangeLog:

PR libstdc++/114945
* include/bits/vector.tcc (vector::_M_default_append): Add
unreachable condition so the compiler knows that _M_finish is
not null.
* testsuite/23_containers/vector/capacity/114945.cc: New test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 844eed3364309bd20cbb7d6793a16b7c6b889ba4)

GCN: Don't emit weak undefined symbols [PR119369]

This resolves all instances of PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'";
for all affected test cases, the execution test status progresses FAIL -> PASS.

This however also causes a small number of (expected) regressions, very similar
to GCC/nvptx:

    [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C  -std=c++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C  -std=c++26 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C  -std=c++98 (test for excess errors)

    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++11  scan-assembler .weak[ \t]*_?_ZTH11derived_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++11  scan-assembler .weak[ \t]*_?_ZTH13container_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++11  scan-assembler .weak[ \t]*_?_ZTH8base_obj
    PASS: g++.dg/cpp0x/pr84497.C  -std=c++11 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++17  scan-assembler .weak[ \t]*_?_ZTH11derived_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++17  scan-assembler .weak[ \t]*_?_ZTH13container_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++17  scan-assembler .weak[ \t]*_?_ZTH8base_obj
    PASS: g++.dg/cpp0x/pr84497.C  -std=c++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++26  scan-assembler .weak[ \t]*_?_ZTH11derived_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++26  scan-assembler .weak[ \t]*_?_ZTH13container_obj
    [-PASS:-]{+FAIL:+} g++.dg/cpp0x/pr84497.C  -std=c++26  scan-assembler .weak[ \t]*_?_ZTH8base_obj
    PASS: g++.dg/cpp0x/pr84497.C  -std=c++26 (test for excess errors)

    [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C  -std=gnu++17  scan-assembler weak[^ \t]*[ \t]_?_Z3foov
    PASS: g++.dg/ext/weak2.C  -std=gnu++17 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C  -std=gnu++26  scan-assembler weak[^ \t]*[ \t]_?_Z3foov
    PASS: g++.dg/ext/weak2.C  -std=gnu++26 (test for excess errors)
    [-PASS:-]{+FAIL:+} g++.dg/ext/weak2.C  -std=gnu++98  scan-assembler weak[^ \t]*[ \t]_?_Z3foov
    PASS: g++.dg/ext/weak2.C  -std=gnu++98 (test for excess errors)

    [-PASS:-]{+FAIL:+} gcc.dg/attr-weakref-1.c (test for excess errors)
    [-FAIL:-]{+UNRESOLVED:+} gcc.dg/attr-weakref-1.c [-execution test-]{+compilation failed to produce executable+}

    @@ -131211,25 +131211,25 @@ PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?c
    PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?d
    PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?e
    PASS: gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?g
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-1.c scan-assembler weak[^ \t]*[ \t]_?j
    PASS: gcc.dg/weak/weak-1.c scan-assembler-not weak[^ \t]*[ \t]_?i

    PASS: gcc.dg/weak/weak-12.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-12.c scan-assembler weak[^ \t]*[ \t]_?foo

    PASS: gcc.dg/weak/weak-15.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?c
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-15.c scan-assembler weak[^ \t]*[ \t]_?d
    PASS: gcc.dg/weak/weak-15.c scan-assembler-not weak[^ \t]*[ \t]_?b

    PASS: gcc.dg/weak/weak-16.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_index
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-16.c scan-assembler weak[^ \t]*[ \t]_?kallsyms_token_table
    PASS: gcc.dg/weak/weak-2.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-2.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e
    PASS: gcc.dg/weak/weak-2.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d

    PASS: gcc.dg/weak/weak-3.c  (test for warnings, line 58)
    PASS: gcc.dg/weak/weak-3.c  (test for warnings, line 73)
    PASS: gcc.dg/weak/weak-3.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1c
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1e
    PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1f
    PASS: gcc.dg/weak/weak-3.c scan-assembler weak[^ \t]*[ \t]_?ffoo1g
    PASS: gcc.dg/weak/weak-3.c scan-assembler-not weak[^ \t]*[ \t]_?ffoo1d

    PASS: gcc.dg/weak/weak-4.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f
    @@ -131267,16 +131267,16 @@ PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j
    PASS: gcc.dg/weak/weak-4.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k

    PASS: gcc.dg/weak/weak-5.c (test for excess errors)
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1a
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1b
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1c
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1d
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1e
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1f
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1g
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1h
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1i
    [-PASS:-]{+FAIL:+} gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1j
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1k
    PASS: gcc.dg/weak/weak-5.c scan-assembler weak[^ \t]*[ \t]_?vfoo1l

These get 'dg-xfail-if'ed or 'dg-skip-if'ed, (mostly) similar to GCC/nvptx.

PR target/119369
gcc/
* config/gcn/gcn-protos.h (gcn_asm_weaken_decl): Declare.
* config/gcn/gcn.cc (gcn_asm_weaken_decl): New.
* config/gcn/gcn-hsa.h (ASM_WEAKEN_DECL): '#define' to this.
gcc/testsuite/
* g++.dg/abi/pure-virtual1.C: 'dg-xfail-if' GCN.
* g++.dg/cpp0x/pr84497.C: 'dg-skip-if' GCN.
* g++.dg/ext/weak2.C: Likewise.
* gcc.dg/attr-weakref-1.c: Likewise.
* gcc.dg/weak/weak-1.c: Likewise.
* gcc.dg/weak/weak-12.c: Likewise.
* gcc.dg/weak/weak-15.c: Likewise.
* gcc.dg/weak/weak-16.c: Likewise.
* gcc.dg/weak/weak-2.c: Likewise.
* gcc.dg/weak/weak-3.c: Likewise.
* gcc.dg/weak/weak-4.c: Likewise.
* gcc.dg/weak/weak-5.c: Likewise.

(cherry picked from commit 2f58d8ac03911063d6a8887a2bee7b4e25ac1871)

GCN, libstdc++: '#define _GLIBCXX_USE_WEAK_REF 0' [PR119369]

This fixes a few hundreds of compilation/linking FAILs (similar to PR69506),
where the GCN/LLVM 'ld' reported:

    ld: error: relocation R_AMDGPU_REL32_LO cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
    >>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
    >>> referenced by cow-stdexcept.cc:259 ([...]/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
    >>>               cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a

    ld: error: relocation R_AMDGPU_REL32_HI cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
    >>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
    >>> referenced by cow-stdexcept.cc:259 ([...]/source-gcc/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
    >>>               cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a

    [...]

..., which is:

    $ c++filt _ZGTtnam
    transaction clone for operator new[](unsigned long)

..., and similarly for other libitm symbols.

However, the affected test cases, if applicable, then run into execution test
FAILs, due to PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'".

PR target/119369
libstdc++-v3/
* config/cpu/gcn/cpu_defines.h: New.
* configure.host [GCN] (cpu_defines_dir): Point to it.

(cherry picked from commit 816335960d020eac92d49bc9cd13729afd313da7)

Fix PR testsuite/116271, gcc.dg/vect/tsvc/vect-tsvc-s176.c fails

gcc/testsuite:
PR testsuite/116271
* gcc.dg/vect/tsvc/vect-tsvc-s176.c [TRUNCATE_TEST]: Make sure
that m stays the same as the loop bound of the middle loop.
* gcc.dg/vect/tsvc/tsvc.h (get_expected_result) <s176> [TRUNCATE_TEST]:
Adjust expected value.

(cherry picked from commit beb94f5979953969593a2387561cdbc8fedfaeb1)

Reduce iteration counts for gcc.dg/vect/tsvc tests.

testsuite/
* gcc.dg/vect/tsvc/tsvc.h (iterations): Allow to override,
default to 10.
(get_expected_result): Add values for iterations counts
10, 256 and 3200.
(run): Add code to output values for new iterations counts.
* gcc.dg/vect/tsvc/vect-tsvc-s1119.c (dg-additional-options):
Add -Diterations=LEN_2D .
* gcc.dg/vect/tsvc/vect-tsvc-s115.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s119.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s125.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2102.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2233.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2275.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s231.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s235.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s176.c: (dg-additional-options):
Add -Diterations=3200 .
[!run_expensive_tests]: dg-additional-options "-DTRUNCATE_TEST" .
[TRUNCATE_TEST]: Set m to 32.

(cherry picked from commit 8fac69a2dbff98ebe1feb87faba0d9b81a173c40)

debug/101533 - ICE with variant typedef DIE generation

There's a sanity check in gen_type_die_with_usage that trips
unnecessarily for a case where the relevant DIE has already been
generated successfully in other ways. The following keys the
existing TREE_ASM_WRITTEN check on the correct object, honoring
this and does nothing instead of ICEing for the testcase at hand.

PR debug/101533
* dwarf2out.cc (gen_type_die_with_usage): When we have
output the typedef already do nothing for a typedef variant.
Do not set TREE_ASM_WRITTEN on the type.

* g++.dg/debug/pr101533.C: New testcase.

(cherry picked from commit 99a3f013c3bb8bc022ca488b40aa18fd97b5224d)

middle-end/101478 - ICE with degenerate address during gimplification

When we gimplify &MEM[0B + 4] we are re-folding the address in case
types are not canonical which ends up with a constant address that
recompute_tree_invariant_for_addr_expr ICEs on. Properly guard
that call.

PR middle-end/101478
* gimplify.cc (gimplify_addr_expr): Check we still have an
ADDR_EXPR before calling recompute_tree_invariant_for_addr_expr.

* gcc.dg/pr101478.c: New testcase.

(cherry picked from commit 33ead6400ad59d4b38fa0527a9a7b53a28114ab7)

tree-optimization/98845 - ICE with tail-merging and DCE/DSE disabled

The following shows that tail-merging will make dead SSA defs live
in paths where it wasn't before, possibly introducing UB or as
in this case, uses of abnormals that eventually fail coalescing
later. The fix is to register such defs for stmt comparison.

PR tree-optimization/98845
* tree-ssa-tail-merge.cc (stmt_local_def): Consider a
def with no uses not local.

* gcc.dg/pr98845.c: New testcase.
* gcc.dg/pr81192.c: Adjust.

(cherry picked from commit 6b8a8c9fd68c5dabaec5ddbc25efeade44f37a14)

lto/91299 - weak definition inlined with LTO

The following fixes a thinko in the handling of interposed weak
definitions which confused the interposition check in
get_availability by setting DECL_EXTERNAL too early.

PR lto/91299
gcc/lto/
* lto-symtab.cc (lto_symtab_merge_symbols): Set DECL_EXTERNAL
only after calling get_availability.

gcc/testsuite/
* gcc.dg/lto/pr91299_0.c: New testcase.
* gcc.dg/lto/pr91299_1.c: Likewise.

(cherry picked from commit bc34db5b12e008f6ec4fdf4ebd22263c8617e5e3)

tree-optimization/87984 - hard register assignments not preserved

The following disables redundant store elimination to hard register
variables which isn't valid.

PR tree-optimization/87984
* tree-ssa-dom.cc (dom_opt_dom_walker::optimize_stmt): Do
not perform redundant store elimination to hard register
variables.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt):
Likewise.

* gcc.target/i386/pr87984.c: New testcase.

(cherry picked from commit 535115caaf97f5201fb528f67f15b4c52be5619d)

c++/79786 - bougs invocation of DATA_ABI_ALIGNMENT macro

The first argument is supposed to be a type, not a decl.

PR c++/79786
gcc/cp/
* rtti.cc (emit_tinfo_decl): Fix DATA_ABI_ALIGNMENT invocation.

(cherry picked from commit 6ec19825b4e72611cdbd4749feed67b61392aa81)

middle-end/66279 - gimplification clobbers shared asm constraints

When the C++ frontend clones a CTOR we do not copy ASM_EXPR constraints
fully as walk_tree does not recurse to TREE_PURPOSE of TREE_LIST nodes.
At this point doing that seems too dangerous so the following instead
avoids gimplification of ASM_EXPRs to clobber the shared constraints
and unshares it there, like it also unshares TREE_VALUE when it
re-writes a "+" output constraint to separate "=" output and matching
input constraint.

PR middle-end/66279
* gimplify.cc (gimplify_asm_expr): Copy TREE_PURPOSE before
rewriting it for "+" processing.

* g++.dg/pr66279.C: New testcase.

(cherry picked from commit 95f5d6cc17e7d6b689674756c62b6b5e1284afd0)

Daily bump.

c++: fix missing lifetime extension [PR119383]

Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR ->
TARGET_EXPR folding not to change its value category. That fix seems
correct but it made us stop extending the lifetime in this testcase,
causing a wrong-code issue -- extend_ref_init_temps_1 did not see
through the extra *& because it doesn't use a tree walk.

This patch reverts r15-8011 and instead handles the problem in
build_over_call by calling force_lvalue in the is_really_empty_class
case as well as in the general case.

PR c++/119383

gcc/cp/ChangeLog:

* call.cc (build_over_call): Use force_lvalue to ensure op= returns
an lvalue.
* cp-tree.h (force_lvalue): Declare.
* cvt.cc (force_lvalue): New.
* typeck.cc (cp_build_indirect_ref_1): Revert r15-8011.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/temp-extend3.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit e9803f10c9f376f6d091e7ef3ad6e1c92e7c8e8c)

OpenMP: Reorder diagnostic in modify_call_for_omp_dispatch [PR119559]

gcc/ChangeLog:

PR middle-end/119559
* gimplify.cc (modify_call_for_omp_dispatch): Reorder checks to avoid
asserts and bogus diagnostic.

(cherry picked from commit de92ac6f11e605987421fe1443b5b81ff172dbb6)

Fix a pasto in ao_compare::compare_ao_refs

When reading the function ao_compare::compare_ao_refs I came accross
what I believe to ba a copy-and-paste error which this patch fixes.

gcc/ChangeLog:

2025-03-10 Martin Jambor <mjambor@suse.cz>

* tree-ssa-alias.cc (ao_compare::compare_ao_refs): Fix a
copy-and-paste error.

(cherry picked from commit dc47161c1f32c3f27d1157ba0de9d98ea1b7fc82)

libstdc++: Avoid bogus -Walloc-size-larger-than warning in test [PR116212]

The compiler can't tell that the vector size fits in int, so it thinks
it might overflow to a negative value, which would then be a huge
positive size_t. In reality, the vector size never exceeds five.

There's no warning on trunk, so just change the local variable to use
type unsigned so that we get rid of the warning on the branches.

libstdc++-v3/ChangeLog:

PR libstdc++/116212
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Use unsigned for vector size.

ipa: Do not modify cgraph edges from thunk clones during inlining (PR116572)

In PR 116572 we hit an assert that a thunk which does not have a body
looks like it has one.  It does not, but the call_stmt of its outgoing
edge points to a statement, which should not.  In fact it has several
outgoing call graph edges, which cannot be.  The problem is that the
code updating the edges to reflect inlining into the master clone (an
ex-thunk, unlike the clone, which is still an unexpanded thunk) is
being updated during inling into the master clone.  This patch simply
makes the code to skip unexpanded thunk clones.

gcc/ChangeLog:

2025-03-13  Martin Jambor  <mjambor@suse.cz>

PR ipa/116572
* cgraph.cc (cgraph_update_edges_for_call_stmt): Do not update
edges of clones that are unexpanded thunk.  Assert that the node
passed as the parameter is not an unexpanded thunk.

gcc/testsuite/ChangeLog:

2025-03-13  Martin Jambor  <mjambor@suse.cz>

PR ipa/116572
* g++.dg/ipa/pr116572.C: New test.

(cherry picked from commit 075ec330307c5b1fe5ed166a633c718c06b01437)

libstdc++: Add ranges::range_common_reference_t for C++20 (LWG 3860)

LWG 3860 added this alias template. Both libc++ and MSVC treat this as a
DR for C++20, so this change does so too.

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (range_common_reference_t): New
alias template, as per LWG 3860.
* testsuite/std/ranges/range.cc: Check it.

(cherry picked from commit 92b554a8412624a0aa3ca9b502976ebec7eff34e)

libstdc++: Check feature test macro for associative container node extraction

Replace some `__cplusplus > 201402L` preprocessor checks with more
expressive checks for the appropriate feature test macro.

libstdc++-v3/ChangeLog:

* include/bits/stl_map.h: Check __glibcxx_node_extract instead
of __cplusplus.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/stl_tree.h: Likewise.

(cherry picked from commit 408f5b847b5b4e552274dc7b02ccaf106395936d)

libstdc++: Add static_assert to std::packaged_task::packaged_task(F&&)

LWG 4154 (approved in Wrocław, November 2024) fixed the Mandates:
precondition for std::packaged_task::packaged_task(F&&) to match what
the implementation actually requires. We already gave a diagnostic in
the right cases as required by the issue resolution, so strictly
speaking we don't need to do anything. But the current diagnostic comes
from inside the implementation of std::__invoke_r and could be more
user-friendly.

For C++17 (when std::is_invocable_r_v is available) add a static_assert
to the constructor, so the error is clear:

.../include/c++/15.0.1/future: In instantiation of 'std::packaged_task<_Res(_ArgTypes ...)>::packaged_task(_Fn&&) [with _Fn = const F&; <template-parameter-2-2> = void; _Res = void; _ArgTypes = {}]':
lwg4154_neg.cc:15:31:   required from here
   15 | std::packaged_task<void()> p(f); // { dg-error "here" "" { target c++17 } }
      |                               ^
.../include/c++/15.0.1/future:1575:25: error: static assertion failed
1575 |           static_assert(is_invocable_r_v<_Res, decay_t<_Fn>&, _ArgTypes...>);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Also add a test to confirm we get a diagnostic as the standard requires.

libstdc++-v3/ChangeLog:

* include/std/future (packaged_task::packaged_task(F&&)): Add
static_assert.
* testsuite/30_threads/packaged_task/cons/dangling_ref.cc: Add
dg-error for new static assertion.
* testsuite/30_threads/packaged_task/cons/lwg4154_neg.cc: New
test.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 4d2683b04fd329c97e3da09498345fe3ee00455f)

libstdc++: Add testcase for std::filesystem::copy [PR118699]

This was fixed last year by r15-2409-g017e3f89b081e4 (and backports), so
just add the testcase.

libstdc++-v3/ChangeLog:

PR libstdc++/118699
* testsuite/27_io/filesystem/operations/copy.cc: Check copying a
file to a directory.

(cherry picked from commit 466da4baba46608882d16d121fa46d33f841bc7b)

Daily bump.

d: Fix error with -Warray-bounds and -O2 [PR117002]

The record layout of class types in D don't get any tail padding, so it
is possible for the `classInstanceSize' to not be a multiple of the
`classInstanceAlignment'.

Rather than setting the instance alignment on the underlying
RECORD_TYPE, instead give the type an alignment of 1, which will mark it
as TYPE_PACKED. The value of `classInstanceAlignment' is instead
applied to the DECL_ALIGN of both the static `init' symbol, and the
stack allocated variable used when generating `new' for a `scope' class.

PR d/117002

gcc/d/ChangeLog:

* decl.cc (aggregate_initializer_decl): Set explicit decl alignment of
class instance.
* expr.cc (ExprVisitor::visit (NewExp *)): Likewise.
* types.cc (TypeVisitor::visit (TypeClass *)): Mark the record type of
classes as packed.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr117002.d: New test.

(cherry picked from commit 9fadadbbbc2b5352e5e70e0e1a9be9b447176913)

OpenMP: modify_call_for_omp_dispatch - fix invalid memory access after 'error' [PR119541]

OpenMP requires that the number of dispatch 'interop' clauses (ninterop)
is less or equal to the number of declare variant 'append_args' interop
objects (nappend).

While 'nappend < ninterop' was diagnosed as error, the processing continues,
which lead to an invalid out-of-bounds memory access. Solution: only
process the first nappend 'interop' clauses.

gcc/ChangeLog:

PR middle-end/119541
* gimplify.cc (modify_call_for_omp_dispatch): Limit interop claues
processing by the number of append_args arguments.

(cherry picked from commit f3899e0fd3f9aa6b579a21e87b50c61ea5c448df)

libstdc++: Add testcases for resolved bug [PR101527]

These tests were fixed by a front-end change r13-465-g4df735e01e3199 so
this just adds them to the testsuite to be sure we don't regress.

libstdc++-v3/ChangeLog:

PR libstdc++/101527
* testsuite/24_iterators/common_iterator/101527.cc: New test.
* testsuite/24_iterators/counted_iterator/101527.cc: New test.

(cherry picked from commit f7c0b0fc4fdeaf034dc38356830625f7280d325d)

libstdc++: Remove stray comma in testing docs

libstdc++-v3/ChangeLog:

* doc/xml/manual/test.xml: Remove stray comma.
* doc/html/manual/test.html: Regenerate.

(cherry picked from commit ac16d6d74fcb4ca10c939b00782b4dfada666273)

libstdc++: Make std::erase for linked lists convert to bool

LWG 4135 (approved in Wrocław, November 2024) fixes the lambda
expressions used by std::erase for std::list and std::forward_list.
Previously they attempted to copy something that isn't required to be
copyable. Instead they should convert it to bool right away.

The issue resolution also changes the lambda's parameter to be const, so
that it can't modify the elements while comparing them.

libstdc++-v3/ChangeLog:

* include/std/forward_list (erase): Change lambda to have
explicit return type and const parameter type.
* include/std/list (erase): Likewise.
* testsuite/23_containers/forward_list/erasure.cc: Check lambda
is correct.
* testsuite/23_containers/list/erasure.cc: Likewise.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
(cherry picked from commit e6e7b477bbdbfb3fee6b44087a59f94fd1e2c7a3)

libstdc++: Fix some broken links in the manual

libstdc++-v3/ChangeLog:

* doc/xml/manual/policy_data_structures_biblio.xml: Fix two
broken links.
* doc/html/manual/policy_data_structures.html: Regenerate.

(cherry picked from commit 1e4d81aab2542f529d23329fcc5e642eedd617d9)

doc: Fix minor grammar nit in -ftrivial-auto-var-init docs

gcc/ChangeLog:

* doc/extend.texi (Common Variable Attributes): Fix grammar in
final sentence of -ftrivial-auto-var-init description.

(cherry picked from commit f695d0392ffc82298d55474cd3025aec26db04ec)

libstdc++: Optimize std::vector construction from input iterators [PR108487]

LWG 3291 make std::ranges::iota_view's iterator have input_iterator_tag
as its iterator_category, even though it satisfies the C++20
std::forward_iterator concept. This means that the traditional
std::vector::vector(InputIterator, InputIterator) constructor treats
iota_view iterators as input iterators, because it only understands the
C++17 iterator requirements, not the C++20 iterator concepts. This
results in a loop that calls emplace_back for each individual element of
the iota_view, requiring the vector to reallocate repeatedly as the
values are inserted. This makes it unnecessarily slow to construct a
vector from an iota_view.

This change adds a new _M_range_initialize_n function for initializing a
vector from a range (which doesn't have to be common) and a size. This
new function can be used by vector(InputIterator, InputIterator) when
std::ranges::distance can be used to get the size. It can also be used
by the _M_range_initialize overload that gets the size for a
Cpp17ForwardIterator pair using std::distance, and by the
vector(initializer_list) constructor.

With this new function constructing a std::vector from iota_view does
a single allocation of the correct size and so doesn't need to
reallocate in a loop.

libstdc++-v3/ChangeLog:

PR libstdc++/108487
* include/bits/stl_vector.h (vector(initializer_list)): Call
_M_range_initialize_n instead of _M_range_initialize.
(vector(InputIterator, InputIterator)): Use _M_range_initialize_n
for C++20 sized sentinels and forward iterators.
(vector::_M_range_initialize(FwIt, FwIt, forward_iterator_tag)):
Use _M_range_initialize_n.
(vector::_M_range_initialize_n): New function.
* testsuite/23_containers/vector/cons/108487.cc: New test.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/initlist-opt1.C: Match _M_range_initialize_n
instead of _M_range_initialize.
* g++.dg/tree-ssa/initlist-opt2.C: Likewise.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit e200f53a5556516ec831e6b7a34aaa0f10a4ab0a)

libstdc++: Update tzdata to 2025b

Import the new 2025b tzdata.zi file.

libstdc++-v3/ChangeLog:

* src/c++20/tzdata.zi: Import new file from 2025b release.

(cherry picked from commit fd3bb314052f04f9357b4dce89fcb61ecfd3a83b)

libstdc++: Update tzdata to 2025a

Import the new 2025a tzdata.zi file. The leapseconds file was also
updated to have a new expiry (no new leap seconds were added).

libstdc++-v3/ChangeLog:

* include/std/chrono (__detail::__get_leap_second_info): Update
expiry date for leap seconds list.
* src/c++20/tzdata.zi: Import new file from 2025a release.
* src/c++20/tzdb.cc (tzdb_list::_Node::_S_read_leap_seconds)
Update expiry date for leap seconds list.

(cherry picked from commit 0ce4c1c48564e465a331c100e757e2258b4c632a)

LoongArch: doc: Add same-address constraint to the description of '-mld-seq-sa'.

gcc/ChangeLog:

* doc/invoke.texi: Modify the description of '-mld-seq-sa'.

(cherry picked from commit 8ad8f74972923baaaaf2b6d6291d31ef53c4ded4)

Daily bump.

OpenACC: array reductions bug fixes

This is a merge of the v4 to v5 diff patch from:
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679682.html

This patch fixes issues found for NVPTX sm_70 testing, and another issue
related to copying to reduction buffer for worker/vector mode.

gcc/ChangeLog:

* config/gcn/gcn-tree.cc (gcn_goacc_reduction_setup): Fix array case
copy source into reduction buffer.
* config/nvptx/nvptx.cc (nvptx_expand_shared_addr): Move default size
init setting place.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_WARPSYNC.
(nvptx_init_builtins): Add DEF() of nvptx_builtin_bar_warpsync.
(nvptx_expand_builtin): Expand NVPTX_BUILTIN_BAR_WARPSYNC.
(nvptx_goacc_reduction_setup): Fix array case copy source into reduction
buffer.
(nvptx_goacc_reduction_fini): Add bar.warpsync for at end of vector-mode
reductions for sm_70 and above.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-2.c: Adjust test.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-5.c: Likewise.

Fix type compatibility for types with flexible array member 2/2 [PR113688,PR114713,PR117724]

For checking or computing TYPE_CANONICAL, ignore the array size when it is
the last element of a structure or union. To not get errors because of
an inconsistent number of members, zero-sized arrays which are the last
element are not ignored anymore when checking the fields of a struct.

PR c/113688
PR c/114014
PR c/114713
PR c/117724

gcc/ChangeLog:
* tree.cc (gimple_canonical_types_compatible_p): Add exception.

gcc/lto/ChangeLog:
* lto-common.cc (hash_canonical_type): Add exception.

gcc/testsuite/ChangeLog:
* gcc.dg/pr113688.c: New test.
* gcc.dg/pr114014.c: New test.
* gcc.dg/pr114713.c: New test.
* gcc.dg/pr117724.c: New test.

(cherry picked from commit d46c7f313b5a30ee04080f249e31e12987d50aa2)

Fix type compatibility for types with flexible array member 1/2 [PR113688,PR114713,PR117724]

Allow the TYPE_MODE of a type with an array as last member to differ from
another compatible type.

gcc/ChangeLog:
* tree.cc (gimple_canonical_types_compatible_p): Add exception.
(verify_type): Add exception.

gcc/lto/ChangeLog:
* lto-common.cc (hash_canonical_type): Add exception.

(cherry picked from commit 1f48225a0ddfaf74a229105343b22f3086c4b8cb)

gcc/testsuite/g++.dg/gomp/append-args-8.C: Fix scan-dump-tree

gcc/testsuite/ChangeLog:

* g++.dg/gomp/append-args-8.C: Remove bogus '3' after \.\[0-9\]+
pattern.

(cherry picked from commit e0886d8ad4c51919c349d0b31f2bec3acbc79e14)

Daily bump.

Reuse scratch registers generated by LRA

Test file: udivmoddi.c
problem insn: 484

Before LRA pass we have:
(insn 484 483 485 72 (parallel [
            (set (reg/v:SI 143 [ __q1 ])
                (plus:SI (reg/v:SI 143 [ __q1 ])
                    (const_int -2 [0xfffffffffffffffe])))
            (clobber (scratch:QI))
        ]) "udivmoddi.c":163:405 discrim 5 186 {addsi3}
     (nil))

LRA substitute all scratches with new pseudos, so we have:
(insn 484 483 485 72 (parallel [
            (set (reg/v:SI 143 [ __q1 ])
                (plus:SI (reg/v:SI 143 [ __q1 ])
                    (const_int -2 [0xfffffffffffffffe])))
            (clobber (reg:QI 619))
        ]) "/mnt/d/avr-lra/udivmoddi.c":163:405 discrim 5 186 {addsi3}
     (expr_list:REG_UNUSED (reg:QI 619)
        (nil)))

Pseudo 619 is a special scratch register generated by LRA which is marked in `scratch_bitmap' and can be tested by call `ira_former_scratch_p(regno)'.

In dump file (udivmoddi.c.317r.reload) we have:
      Creating newreg=619
Removing SCRATCH to p619 in insn #484 (nop 3)
rescanning insn with uid = 484.

After that LRA tries to spill (reg:QI 619)
It's a bug because (reg:QI 619) is an output scratch register which is already something like spill register.

Fragment from udivmoddi.c.317r.reload:
      Choosing alt 2 in insn 484:  (0) r  (1) 0  (2) nYnn  (3) &d {addsi3}
      Creating newreg=728 from oldreg=619, assigning class LD_REGS to r728

IMHO: the bug is in lra-constraints.cc in function `get_reload_reg'
fragment of `get_reload_reg':
  if (type == OP_OUT)
    {
      /* Output reload registers tend to start out with a conservative
choice of register class.  Usually this is ALL_REGS, although
a target might narrow it (for performance reasons) through
targetm.preferred_reload_class.  It's therefore quite common
for a reload instruction to require a more restrictive class
than the class that was originally assigned to the reload register.

In these situations, it's more efficient to refine the choice
of register class rather than create a second reload register.
This also helps to avoid cycling for registers that are only
used by reload instructions.  */
      if (REG_P (original)
  && (int) REGNO (original) >= new_regno_start
  && INSN_UID (curr_insn) >= new_insn_uid_start
__________________________________^^
  && in_class_p (original, rclass, &new_class, true))
{
  unsigned int regno = REGNO (original);
  if (lra_dump_file != NULL)
    {
      fprintf (lra_dump_file, " Reuse r%d for output ", regno);
      dump_value_slim (lra_dump_file, original, 1);
    }

This condition incorrectly limits register reuse to ONLY newly generated instructions.
i.e. LRA can reuse registers only from insns generated by himself.

IMHO:It's wrong.
Scratch registers generated by LRA also have to be reused.

The patch is very simple.
On x86_64, it bootstraps+regtests fine.

gcc/
PR target/116550
PR target/119340
* lra-constraints.cc (get_reload_reg): Reuse scratch registers
generated by LRA.

(cherry picked from commit e7393cbb5f2cae50b42713e71984064073aa378a)

Daily bump.

Merge commit '8a624a127990aee47d02b3d64892f8de9031975e' into HEAD

Daily bump.

widening_mul: Fix up further r14-8680 widening mul issues [PR119417]

The following testcase is miscompiled since r14-8680 PR113560 changes.
I've already tried to fix some of the issues caused by that change in
r14-8823 PR113759, but apparently didn't get it right.

The problem is that the r14-8680 changes sometimes set *type_out to
a narrower type than the *new_rhs_out actually has (because it will
handle stuff like _1 = rhs1 & 0xffff; and imply from that HImode type_out.

Now, if in convert_mult_to_widen or convert_plusminus_to_widen we actually
get optab for the modes we've asked for (i.e. with from_mode and to_mode),
everything works fine, if the operands don't have the expected types,
they are converted to those (for INTEGER_CSTs with fold_convert,
otherwise with build_and_insert_cast).
On the following testcase on aarch64 that is not the case, we ask
for from_mode HImode and to_mode DImode, but get actual_mode SImode.
The mult_rhs1 operand already has SImode and we change type1 to unsigned int
and so no cast is actually done, except that the & 0xffff is lost that way.

The following patch ensures that if we change typeN because of wider
actual_mode (or because of a sign change), we first cast to the old
typeN (if the r14-8680 code was encountered, otherwise it would have the
same precision) and only then change it, and then perhaps cast again.

On the testcase on aarch64-linux the patch results in the expected
-       add     x19, x19, w0, uxtw 1
+       add     x19, x19, w0, uxth 1
difference.

2025-03-26  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/119417
* tree-ssa-math-opts.cc (convert_mult_to_widen): Before changing
typeN because actual_precision/from_unsignedN differs cast rhsN
to typeN if it has a different type.
(convert_plusminus_to_widen): Before changing
typeN because actual_precision/from_unsignedN differs cast mult_rhsN
to typeN if it has a different type.

* gcc.dg/torture/pr119417.c: New test.

(cherry picked from commit 02132faf4e2fb604758aa86f0b097e6871be595a)

i386: Require in peephole2 that memory is offsettable [PR119450]

The following testcase ICEs because a peephole2 attempts to offset
memory which is not offsettable (in particular address is a ZERO_EXTEND
in this case).

Because peephole2s don't support constraints, I've added a check for this
in the peephole2's condition.

2025-03-26 Jakub Jelinek <jakub@redhat.com>

PR target/119450
* config/i386/i386.md (narrow test peephole2): Test for
offsettable_memref_p in condition.

* gcc.target/i386/pr119450.c: New test.

(cherry picked from commit 84f0b648aeb053b3bd8e1cb6fe282f4da4143708)

Fix up some further cases of missing or extraneous spaces in diagnostics

Given the recent PR119406 I've tried to grep for concatenated string
literals without space at the end of one line and at the start of next line,
unless it was obviously intentional.
Furthermore, I've then looked through gcc.pot looking for 2 adjacent spaces
and looking back if that wasn't the case of "something "
" with spaces at both sides".

Here is the result from that.

I think just the c.opt change needs an explanation, the "" in the
description is simply eaten up somewhere during the option processing and
gcc -v --help before this patch was displaying
-Wdeprecated-literal-operator Warn about deprecated space between and suffix in a user-defined literal operator.

2025-03-22 Jakub Jelinek <jakub@redhat.com>

gcc/
* gimplify.cc (warn_switch_unreachable_and_auto_init_r): Add missing
space in the middle of diagnostics.
* tree-vect-stmts.cc (vectorizable_load): Add missing space in the
middle of debug dump message.
gcc/fortran/
* resolve.cc (resolve_procedure_expression): Remove extraneous space
from the middle of diagnostics.

(cherry picked from commit 20360e4b6b5a63bc65d1855a7ecf22eb7148a452)

c++: Evaluate immediate invocation call arguments with mce_true [PR119150]

Since Marek's r14-4140 which moved immediate invocation evaluation
from build_over_call to cp_fold_r, the following testcase is miscompiled.

The a = foo (bar ()); case is actually handled right, that is handled
in cp_fold_r and the whole CALL_EXPR is at that point evaluated by
cp_fold_immediate_r with cxx_constant_value (stmt, tf_none);
and that uses mce_true for evaluation of the argument as well as the actual
call.

But in the bool b = foo (bar ()); case we actually try to evaluate this
as non-manifestly constant-evaluated.  And while
          /* Make sure we fold std::is_constant_evaluated to true in an
             immediate function.  */
          if (DECL_IMMEDIATE_FUNCTION_P (fun))
            call_ctx.manifestly_const_eval = mce_true;
ensures that if consteval and __builtin_is_constant_evaluated () is true
inside of that call, this happens after arguments to the function
have been already constant evaluated in cxx_bind_parameters_in_call.
The call_ctx in that case also includes new call_ctx.call, something that
shouldn't be used for the arguments, so the following patch just arranges
to call cxx_bind_parameters_in_call with manifestly_constant_evaluated =
mce_true.

2025-03-13  Jakub Jelinek  <jakub@redhat.com>

PR c++/119150
* constexpr.cc (cxx_eval_call_expression): For
DECL_IMMEDIATE_FUNCTION_P (fun) set manifestly_const_eval in new_ctx
and new_call to mce_true and set ctx to &new_ctx.

* g++.dg/cpp2a/consteval41.C: New test.

(cherry picked from commit ebf6e6241f5658a3cae462b1314f4a8f2bc71760)

builtins: Fix up strspn/strcspn folding [PR119219]

The PR119204 r15-7955 fix caused some regressions.
The problem is that the fold_builtin* APIs document that expr is
either a CALL_EXPR of the call or NULL, so using TREE_TYPE (expr)
can crash e.g. during constexpr evaluation etc.

As can be seen in the surrounding patch, for the neighbouring builtins
(both modf and strpbrk) fold_builtin_2 passes down type, which is the
result type, TREE_TYPE (TREE_TYPE (fndecl)) and those builtins use it
to build the return value, while strspn was always building size_type_node
and strcspn had this change from that to TREE_TYPE (expr).
The patch passes type to these two and uses it there as well.

The patch keeps passing expr because it is used in the
check_nul_terminated_array calls done for both strspn and strcspn,
those calls clearly can deal with NULL expr but prefer if it is non-NULL
for some warning.

2025-03-12 Jakub Jelinek <jakub@redhat.com>

PR middle-end/119204
PR middle-end/119219
* builtins.cc (fold_builtin_2): Pass type as another argument
to fold_builtin_strspn and fold_builtin_strcspn.
(fold_builtin_strspn): Add type argument, use it instead of
size_type_node.
(fold_builtin_strcspn): Add type argument, use it instead of
TREE_TYPE (expr).

(cherry picked from commit da967f0ff324053304b350fdb18384607a346ebd)

middle-end/119204 - ICE with strcspn folding

The following makes sure to convert the folded expression to the
original expression type.

PR middle-end/119204
* builtins.cc (fold_builtin_strcspn): Preserve the original
expression type.

* gcc.dg/pr119204.c: New testcase.

(cherry picked from commit 68932eeb38f66fbc0c3cf4b77ff7dde8a408f2e4)

tree: Improve skip_simple_arithmetic [PR119183]

The following testcase takes very long time to compile, because
skip_simple_arithmetic decides to first call tree_invariant_p on
the second argument (and indirectly recurse there).  I think before
canonicalization of operands for commutative binary expressions
(and for non-commutative ones always) it is pretty common that the
first operand is a constant, something which tree_invariant_p handles
immediately, so the following patch special cases that; I've added
there a tree_invariant_p call too after the checks, while it is not
really needed currently, tree_invariant_p has the same checks, I wanted
to be prepared in case tree_invariant_p changes.  But if you think
I should avoid it, I can drop it too.

This is just a partial fix, I think one can certainly construct a testcase
which will still have horrible compile time complexity (but I've tried and
haven't managed to do so), so perhaps we should just limit the recursion
depth through skip_simple_arithmetic/tree_invariant_p with some defaulted
argument.

2025-03-11  Jakub Jelinek  <jakub@redhat.com>

PR c/119183
* tree.cc (skip_simple_arithmetic): If first operand of binary
expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call
tree_invariant_p on that operand first instead of on the second.

* gcc.dg/pr119183.c: New test.

(cherry picked from commit 20e5aa9cc1519f871cce25dbfdc149d9d60da779)

libgcc: Fix up unwind-dw2-btree.h [PR119151]

The following testcase shows a bug in unwind-dw2-btree.h.
In short, the header provides lock-free btree data structure (so no parent
link on nodes, both insertion and deletion are done in top-down walks
with some locking of just a few nodes at a time so that lookups can notice
concurrent modifications and retry, non-leaf (inner) nodes contain keys
which are initially the base address of the left-most leaf entry of the
following child (or all ones if there is none) minus one, insertion ensures
balancing of the tree to ensure [d/2, d] entries filled through aggressive
splitting if it sees a full tree while walking, deletion performs various
operations like merging neighbour trees, merging into parent or moving some
nodes from neighbour to the current one).
What differs from the textbook implementations is mostly that the leaf nodes
don't include just address as a key, but address range, address + size
(where we don't insert any ranges with zero size) and the lookups can be
performed for any address in the [address, address + size) range.  The keys
on inner nodes are still just address-1, so the child covers all nodes
where addr <= key unless it is covered already in children to the left.
The user (static executables or JIT) should always ensure there is no
overlap in between any of the ranges.

In the testcase a bunch of insertions are done, always followed by one
removal, followed by one insertion of a range slightly different from the
removed one.  E.g. in the first case [&code[0x50], &code[0x59]] range
is removed and then we insert [&code[0x4c], &code[0x53]] range instead.
This is valid, it doesn't overlap anything.  But the problem is that some
non-leaf (inner) one used the &code[0x4f] key (after the 11 insertions
completely correctly).  On removal, nothing adjusts the keys on the parent
nodes (it really can't in the top-down only walk, the keys could be many nodes
above it and unlike insertion, removal only knows the start address, doesn't
know the removed size and so will discover it only when reaching the leaf
node which contains it; plus even if it knew the address and size, it still
doesn't know what the second left-most leaf node will be (i.e. the one after
removal)).  And on insertion, if nodes aren't split at a level, nothing
adjusts the inner keys either.  If a range is inserted and is either fully
bellow key (keys are - 1, so having address + size - 1 being equal to key is
fine) or fully after key (i.e. address > key), it works just fine, but if
the key is in a middle of the range like in this case, &code[0x4f] is in the
middle of the [&code[0x4c], &code[0x53]] range, then insertion works fine
(we only use size on the leaf nodes), and lookup of the addresses below
the key work fine too (i.e. [&code[0x4c], &code[0x4f]] will succeed).
The problem is with lookups after the key (i.e. [&code[0x50, &code[0x53]]),
the lookup looks for them in different children of the btree and doesn't
find an entry and returns NULL.

As users need to ensure non-overlapping entries at any time, the following
patch fixes it by adjusting keys during insertion where we know not just
the address but also size; if we find during the top-down walk a key
which is in the middle of the range being inserted, we simply increase the
key to be equal to address + size - 1 of the range being inserted.
There can't be any existing leaf nodes overlapping the range in correct
programs and the btree rebalancing done on deletion ensures we don't have
any empty nodes which would also cause problems.

The patch adjusts the keys in two spots, once for the current node being
walked (the last hunk in the header, with large comment trying to explain
it) and once during inner node splitting in a parent node if we'd otherwise
try to add that key in the middle of the range being inserted into the
parent node (in that case it would be missed in the last hunk).
The testcase covers both of those spots, so succeeds with GCC 12 (which
didn't have btrees) and fails with vanilla GCC trunk and also fails if
either the
  if (fence < base + size - 1)
    fence = iter->content.children[slot].separator = base + size - 1;
or
  if (left_fence >= target && left_fence < target + size - 1)
    left_fence = target + size - 1;
hunk is removed (of course, only with the current node sizes, i.e. up to
15 children of inner nodes and up to 10 entries in leaf nodes).

2025-03-10  Jakub Jelinek  <jakub@redhat.com>
    Michael Leuchtenburg  <michael@slashhome.org>

PR libgcc/119151
* unwind-dw2-btree.h (btree_split_inner): Add size argument.  If
left_fence is in the middle of [target,target + size - 1] range,
increase it to target + size - 1.
(btree_insert): Adjust btree_split_inner caller.  If fence is smaller
than base + size - 1, increase it and separator of the slot to
base + size - 1.

* gcc.dg/pr119151.c: New test.

(cherry picked from commit 21109b37e8585a7a1b27650fcbf1749380016108)

c++: Update TYPE_FIELDS of variant types if cp_parser_late_parsing_default_args etc. modify it [PR98533]

The following testcases ICE during type verification, because TYPE_FIELDS
of e.g. S RECORD_TYPE in pr119123.C is different from TYPE_FIELDS of const S.
Various decls are added to S's TYPE_FIELDS first, then finish_struct
indirectly calls fixup_type_variants to sync the variant copies.
But later on cp_parser_class_specifier calls
cp_parser_late_parsing_default_args and that apparently adds a lambda
type (from default argument) to TYPE_FIELDS of S.
Dunno if that is right or not, assuming it is right, the following
patch fixes it by updating TYPE_FIELDS of variant types if there were
any changes in the various functions cp_parser_class_specifier defers and
calls on the outermost enclosing class.
There was quite a lot of code repetition already before, so the patch
uses a lambda to avoid the repetitions.
To my surprise, in some of the contract testcases (
g++.dg/contracts/contracts-friend1.C
g++.dg/contracts/contracts-nested-class1.C
g++.dg/contracts/contracts-nested-class2.C
g++.dg/contracts/contracts-redecl7.C
g++.dg/contracts/contracts-redecl8.C
) it is actually setting class_type and pushing TRANSLATION_UNIT_DECL
rather than some class types in some cases.

Or should the lambda pushing into the containing class be somehow avoided?

2025-03-06 Jakub Jelinek <jakub@redhat.com>

PR c++/98533
PR c++/119123
* parser.cc (cp_parser_class_specifier): Update TYPE_FIELDS of
variant types in case cp_parser_late_parsing_default_args etc. change
TYPE_FIELDS on the main variant. Add switch_to_class lambda and
use it to simplify repeated class switching code.

* g++.dg/cpp0x/pr98533.C: New test.
* g++.dg/cpp0x/pr119123.C: New test.

(cherry picked from commit 179e01085b0aed111ef1f7908c4b87c800f880e9)

value-range: Fix up irange::union_bitmask [PR118953]

The following testcase is miscompiled during evrp.
Before vrp, we have (from ccp):
  # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x2d
  _3 = _2 + 18446744073708503085;
...
  # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x59
  _6 = (long long unsigned int) _5;
  # RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34
  _7 = k_11 + -1048524;
  switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> [33.33%], case 32: <L6> [33.33%]>
...
  # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d VALUE 0x0
  # i_20 = PHI <_3(4), 0(3), _6(2)>
and evrp is now trying to figure out range for i_20 in range_of_phi.

All the ranges and MASK/VALUE pairs above are correct for the testcase,
k_11 and _2 based on it is a result of multiplication by a constant with low
14 bits cleared and then some numbers are added to it.

There is an obvious missed optimization for which I've filed PR119039,
simplify_switch_using_ranges could see that all the labels but default
are unreachable because the controlling expression has
MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that.

Anyway, during range_of_phi for i_20, we process the PHI arguments
in order.  For the _3(4) case, we figure out that it is reachable
through the case 24: case 32: labels only of the switch and that
0x34 - 0x2d is 7, so derive
[irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
(the MASK/VALUE just got inherited from the _3 earlier range).
Now (not suprisingly because those labels aren't actually reachable),
that range is inconsistent, 0x2d is 45, so there is conflict between the
values and the irange_bitmask.
value-range.{h,cc} code differentiates between actually stored
irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and
semantic bitmask, which is what get_bitmask returns.  That is
  // The mask inherent in the range is calculated on-demand.  For
  // example, [0,255] does not have known bits set by default.  This
  // saves us considerable time, because setting it at creation incurs
  // a large penalty for irange::set.  At the time of writing there
  // was a 5% slowdown in VRP if we kept the mask precisely up to date
  // at all times.  Instead, we default to -1 and set it when
  // explicitly requested.  However, this function will always return
  // the correct mask.
  //
  // This also means that the mask may have a finer granularity than
  // the range and thus contradict it.  Think of the mask as an
  // enhancement to the range.  For example:
  //
  // [3, 1000] MASK 0xfffffffe VALUE 0x0
  //
  // 3 is in the range endpoints, but is excluded per the known 0 bits
  // in the mask.
  //
  // See also the note in irange_bitmask::intersect.
  irange_bitmask bm
    = get_bitmask_from_range (type (), lower_bound (), upper_bound ());
  if (!m_bitmask.unknown_p ())
    bm.intersect (m_bitmask);
Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects
that with that MASK 0xffffffffffffc000 VALUE 0x2d.
Which triggers the ugly special case in irange_bitmask::intersect:
  // If we have two known bits that are incompatible, the resulting
  // bit is undefined.  It is unclear whether we should set the entire
  // range to UNDEFINED, or just a subset of it.  For now, set the
  // entire bitmask to unknown (VARYING).
  if (wi::bit_and (~(m_mask | src.m_mask),
                   m_value ^ src.m_value) != 0)
    {
      unsigned prec = m_mask.get_precision ();
      m_mask = wi::minus_one (prec);
      m_value = wi::zero (prec);
    }
so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0.

Next, range_of_phi attempts to union it with the 0(3) PHI argument,
and during irange::union_ first adds the [0,0] to the subranges, so
[irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d
and then goes on to irange::union_bitmask which does
  if (m_bitmask == r.m_bitmask)
    return false;
  irange_bitmask bm = get_bitmask ();
  irange_bitmask save = bm;
  bm.union_ (r.get_bitmask ());
  if (save == bm)
    return false;
  m_bitmask = bm;
  if (save == get_bitmask ())
    return false;
m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as
r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask
(but note, not from the original range before union, but the modified one,
dunno if that isn't a problem as well), which is still the VARYING/unknown_p
one, union_ that with MASK 0x0 VALUE 0x0 and get still
MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic
bitmask didn't change, so we are fine (not!, see later).

Except then we try to union with the third PHI argument.  And, because the
edge to that comes only from case 8: label and there is a known difference
between the two, the argument is actually already from earlier replaced by
45(2) constant.  So, irange::union_ adds the [45, 45] range to the list
of subranges, but voila, 45 is 0x2d and satisfies the stored
MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to
from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d
by that addition.  Eventually, we just optimize this to
[irange] long long unsigned int [45, 45] because that is the only range
which satisfies the bitmask.  And that is wrong, at runtime i_20 has
value 0.

The following patch attempts to detect this case where get_bitmask
turns some non-VARYING m_bitmask into VARYING one because of a conflict
and in that case makes sure m_bitmask is actually updated rather than
unmodified, so that later union_ doesn't cause problems.

I also wonder whether e.g. get_bitmask couldn't have special case for this
and if bm.intersect (m_bitmask); yields unknown_p from something not
originally unknown_p, perhaps chooses to just use get_bitmask_from_range
value and ignore the stored m_bitmask.  Though, dunno how union_bitmask
in that case would figure out it needs to update m_bitmask.

2025-03-05  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/118953
* value-range.cc (irange::union_bitmask): Update m_bitmask if
get_bitmask () is unknown_p and m_bitmask is not even when the
semantic bitmask didn't change and returning false.

* gcc.dg/torture/pr118953.c: New test.

(cherry picked from commit 54da358ff51ded726fe7c026fa59c8db0a1b72ed)