Thomas Schwinge [Wed, 16 Apr 2025 14:52:08 +0000 (16:52 +0200)]
Remove 'ALWAYS_INLINE' workaround in 'libgomp.c++/target-exceptions-pr118794-1.C'
With commit ca9cffe737d20953082333dacebb65d4261e0d0c
"For nvptx offloading, make sure to emit C++ constructor, destructor aliases [PR97106]",
we're able to remove the 'ALWAYS_INLINE' workaround added in
commit fe283dba774be57b705a7a871b000d2894d2e553
"GCN, nvptx: Support '-mfake-exceptions', and use it for offloading compilation [PR118794]".
Thomas Schwinge [Fri, 28 Mar 2025 08:20:49 +0000 (09:20 +0100)]
GCN, nvptx: Support '-mfake-exceptions', and use it for offloading compilation [PR118794]
With '-mfake-exceptions' enabled, the user-visible behavior in presence of
exception handling constructs changes such that the compile-time
'sorry, unimplemented: exception handling not supported' is skipped, code
generation proceeds, and instead, exception handling constructs 'abort' at
run time. (..., or don't, if they're in dead code.)
ld: error: undefined symbol: _Unwind_RaiseException
>>> referenced by eh_throw.cc:93 ([...]/source-gcc/libstdc++-v3/libsupc++/eh_throw.cc:93)
>>> eh_throw.o:(__cxa_throw) in archive /srv/data/tschwinge/amd-instinct2/gcc/build/submit-light-target_gcn/build-gcc/amdgcn-amdhsa/gfx908/libstdc++-v3/src/.libs/libstdc++.a
[...]
collect2: error: ld returned 1 exit status
..., and/or:
ld: error: undefined symbol: _Unwind_Resume_or_Rethrow
>>> referenced by eh_throw.cc:129 ([...]/source-gcc/libstdc++-v3/libsupc++/eh_throw.cc:129)
>>> eh_throw.o:(__cxa_rethrow) in archive /srv/data/tschwinge/amd-instinct2/gcc/build/submit-light-target_gcn/build-gcc/amdgcn-amdhsa/gfx908/libstdc++-v3/src/.libs/libstdc++.a
[...]
collect2: error: ld returned 1 exit status
..., and nvptx:
unresolved symbol _Unwind_RaiseException
collect2: error: ld returned 1 exit status
..., or:
unresolved symbol _Unwind_Resume_or_Rethrow
collect2: error: ld returned 1 exit status
For both GCN, nvptx, this each progresses ~25 'check-gcc-c++',
and ~10 'check-target-libstdc++-v3' test cases:
[-FAIL:-]{+PASS:+} [...] (test for excess errors)
..., with (if applicable, for most of them):
[-UNRESOLVED:-]{+PASS:+} [...] [-compilation failed to produce executable-]{+execution test+}
..., or some 'FAIL: [...] execution test' where these test cases now FAIL when
attempting to use these interfaces, or, if applicable, FAIL due to run-time
'GCC/nvptx: sorry, unimplemented: dynamic stack allocation not supported'.
Thomas Schwinge [Tue, 18 Mar 2025 09:10:30 +0000 (10:10 +0100)]
GCN, nvptx: Define '_Unwind_DeleteException'
This resolves GCN:
ld: error: undefined symbol: _Unwind_DeleteException
>>> referenced by eh_catch.cc:109 ([...]/source-gcc/libstdc++-v3/libsupc++/eh_catch.cc:109)
>>> eh_catch.o:(__cxa_end_catch) in archive [...]/build-gcc/amdgcn-amdhsa/libstdc++-v3/src/.libs/libstdc++.a
[...]
collect2: error: ld returned 1 exit status
..., and nvptx:
unresolved symbol _Unwind_DeleteException
collect2: error: ld returned 1 exit status
For both GCN, nvptx, this each progresses ~100 'check-gcc-c++',
and ~500 'check-target-libstdc++-v3' test cases:
[-FAIL:-]{+PASS:+} [...] (test for excess errors)
..., with (if applicable, for most of them):
[-UNRESOLVED:-]{+PASS:+} [...] [-compilation failed to produce executable-]{+execution test+}
..., or just a few 'FAIL: [...] execution test' where these test cases now
FAIL for unrelated reasons, or, if applicable, FAIL due to run-time
'GCC/nvptx: sorry, unimplemented: dynamic stack allocation not supported'.
Thomas Schwinge [Sat, 5 Apr 2025 21:11:23 +0000 (23:11 +0200)]
GCN, nvptx libstdc++: Force use of '__atomic' builtins [PR119645]
For both GCN, nvptx, this gets rid of 'configure'-time:
configure: WARNING: No native atomic operations are provided for this platform.
configure: WARNING: They will be faked using a mutex.
configure: WARNING: Performance of certain classes will degrade as a result.
..., and changes:
-checking for lock policy for shared_ptr reference counts... mutex
+checking for lock policy for shared_ptr reference counts... atomic
That means, '[...]/[target]/libstdc++-v3/', 'Makefile's change:
..., and '[...]/[target]/libstdc++-v3/config.h' changes:
/* Defined if shared_ptr reference counting should use atomic operations. */
-/* #undef HAVE_ATOMIC_LOCK_POLICY */
+#define HAVE_ATOMIC_LOCK_POLICY 1
/* Define if the compiler supports C++11 atomics. */
-/* #undef _GLIBCXX_ATOMIC_BUILTINS */
+#define _GLIBCXX_ATOMIC_BUILTINS 1
..., and '[...]/[target]/libstdc++-v3/include/[target]/bits/c++config.h'
changes:
/* Defined if shared_ptr reference counting should use atomic operations. */
-/* #undef _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY */
+#define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1
/* Define if the compiler supports C++11 atomics. */
-/* #undef _GLIBCXX_ATOMIC_BUILTINS */
+#define _GLIBCXX_ATOMIC_BUILTINS 1
This means that '[...]/[target]/libstdc++-v3/libsupc++/atomicity.cc',
'[...]/[target]/libstdc++-v3/libsupc++/atomicity.o' then uses atomic
instructions for synchronization instead of C++ static local variables, which
in turn for their guard variables, via 'libstdc++-v3/libsupc++/guard.cc', used
'libgcc/gthr.h' recursive mutexes, which currently are unsupported for GCN.
For GCN, this turns ~500 libstdc++ execution test FAILs into PASSes, and also
progresses:
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 execution test
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++26 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 execution test
UNSUPPORTED: g++.dg/tree-ssa/pr20458.C -std=gnu++98: exception handling not supported
(For nvptx, there is no effective change, due to other misconfiguration.)
Thomas Schwinge [Sun, 6 Apr 2025 15:44:18 +0000 (17:44 +0200)]
nvptx: Support '-mfake-ptx-alloca': defer failure to run-time 'alloca' usage
Follow-up to commit 1146410c0feb0e82c689b1333fdf530a2b34dc2b
"nvptx: Support '-mfake-ptx-alloca'". '-mfake-ptx-alloca' is applicable only
for configurations where PTX 'alloca' is not supported, where target libraries
are built with it enabled (that is, libstdc++, libgfortran).
This change progresses:
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 [-compilation failed to produce executable-]{+execution test+}
UNSUPPORTED: g++.dg/tree-ssa/pr20458.C -std=gnu++98: exception handling not supported
..., and "enables" a few test cases:
FAIL: g++.old-deja/g++.other/sibcall1.C -std=gnu++17 (test for excess errors)
[Etc.]
FAIL: g++.old-deja/g++.other/unchanging1.C -std=gnu++17 (test for excess errors)
[Etc.]
..., which now (unrelatedly to 'alloca', and in the same way as configurations
where PTX 'alloca' is supported) FAIL due to:
unresolved symbol _Unwind_DeleteException
collect2: error: ld returned 1 exit status
Most importantly, it progresses ~830 libstdc++ test cases:
[-FAIL:-]{+PASS:+} [...] (test for excess errors)
..., with (if applicable, for most of them):
[-UNRESOLVED:-]{+PASS:+} [...] [-compilation failed to produce executable-]{+execution test+}
..., or just a few 'FAIL: [...] execution test' where these test cases also
FAIL in configurations where PTX 'alloca' is supported, or ~120 instances of
'FAIL: [...] execution test' due to run-time
'GCC/nvptx: sorry, unimplemented: dynamic stack allocation not supported'.
| With '-mfake-ptx-alloca', libgfortran again succeeds to build, and compared
| to before, we've got only a small number of regressions due to nvptx 'ld'
| complaining about 'unresolved symbol __GCC_nvptx__PTX_alloca_not_supported':
|
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/codimension_2.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/codimension_2.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single [-execution test-]{+compilation failed to produce executable+}
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/event_4.f08 -fcoarray=lib -O2 -lcaf_single [-compilation failed to produce executable-]{+execution test+}
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single [-execution test-]{+compilation failed to produce executable+}
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/fail_image_2.f08 -fcoarray=lib -O2 -lcaf_single [-compilation failed to produce executable-]{+execution test+}
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
| [-PASS:-]{+UNRESOLVED:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single [-execution test-]{+compilation failed to produce executable+}
[-FAIL:-]{+PASS:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/coarray/proc_pointer_assign_1.f90 -fcoarray=lib -O2 -lcaf_single [-compilation failed to produce executable-]{+execution test+}
| [-PASS:-]{+FAIL:+} gfortran.dg/coarray_43.f90 -O (test for excess errors)
[-FAIL:-]{+PASS:+} gfortran.dg/coarray_43.f90 -O (test for excess errors)
..., and further progresses:
[-FAIL:-]{+PASS:+} gfortran.dg/coarray_lib_comm_1.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+FAIL:+} gfortran.dg/coarray_lib_comm_1.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[Etc.]
..., which now (unrelatedly to 'alloca', and in the same way as configurations
where PTX 'alloca' is supported) FAILs due to:
error : Prototype doesn't match for '_gfortran_caf_transfer_between_remotes' in 'input file 9 at offset 159897', first defined in 'input file 9 at offset 159897'
error : Prototype doesn't match for '_gfortran_caf_stop_numeric' in 'input file 9 at offset 159897', first defined in 'input file 9 at offset 159897'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
Thomas Schwinge [Wed, 2 Apr 2025 08:25:17 +0000 (10:25 +0200)]
nvptx: Don't use PTX '.const', constant state space [PR119573]
This avoids cases where a "File uses too much global constant data" (final
executable, or single object file), and avoids cases of wrong code generation:
"error : State space incorrect for instruction 'st'" ('st.const'), or another
case where an "illegal instruction was encountered", or a lot of cases where
for two compilation units (such as a library linked with user code) we ran into
"error : Memory space doesn't match" due to differences in '.const' usage
between definition and use of a variable.
We progress:
ptxas error : File uses too much global constant data (0x1f01a bytes, 0x10000 max)
nvptx-run: cuLinkAddData failed: a PTX JIT compilation failed (CUDA_ERROR_INVALID_PTX, 218)
... into:
PASS: 20_util/to_chars/103955.cc -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} 20_util/to_chars/103955.cc -std=gnu++17 execution test
We progress:
ptxas error : File uses too much global constant data (0x36c65 bytes, 0x10000 max)
nvptx-as: ptxas returned 255 exit status
... into:
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O0 {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O1 {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O2 {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O3 -g {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -Os {+(test for excess errors)+}
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O0 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O1 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O2 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O3 -g (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -Os (test for excess errors)
[-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-1.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-1.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-4.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-4.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-5.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-5.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} 20_util/to_chars/double.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} 20_util/to_chars/double.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} 20_util/to_chars/float.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} 20_util/to_chars/float.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} special_functions/13_ellint_3/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} special_functions/13_ellint_3/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/14_ellint_3/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/14_ellint_3/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
..., and progress likewise, but fail later with an unrelated error:
[-FAIL:-]{+PASS:+} ext/special_functions/hyperg/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+FAIL:+} ext/special_functions/hyperg/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[...]/libstdc++-v3/testsuite/ext/special_functions/hyperg/check_value.cc:12317: void test(const testcase_hyperg<Ret> (&)[Num], Ret) [with Ret = double; unsigned int Num = 19]: Assertion 'max_abs_frac < toler' failed.
..., and:
[-FAIL:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+FAIL:+} tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[...]/libstdc++-v3/testsuite/tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc:12316: void test(const testcase_hyperg<Ret> (&)[Num], Ret) [with Ret = double; unsigned int Num = 19]: Assertion 'max_abs_frac < toler' failed.
We progress:
nvptx-run: error getting kernel result: an illegal instruction was encountered (CUDA_ERROR_ILLEGAL_INSTRUCTION, 715)
... into:
PASS: g++.dg/cpp1z/inline-var1.C -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C -std=gnu++17 execution test
PASS: g++.dg/cpp1z/inline-var1.C -std=gnu++20 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C -std=gnu++20 execution test
PASS: g++.dg/cpp1z/inline-var1.C -std=gnu++26 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C -std=gnu++26 execution test
(A lot of '.const' -> '.global' etc. Haven't researched what the actual
problem was.)
We progress:
ptxas /tmp/cc5TSZZp.o, line 142; error : State space incorrect for instruction 'st'
ptxas /tmp/cc5TSZZp.o, line 174; error : State space incorrect for instruction 'st'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
... into:
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O0 [-compilation failed to produce executable-]{+execution test+}
PASS: g++.dg/torture/builtin-clear-padding-1.C -O1 (test for excess errors)
PASS: g++.dg/torture/builtin-clear-padding-1.C -O1 execution test
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O2 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O2 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O3 -g (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O3 -g [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -Os (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -Os [-compilation failed to produce executable-]{+execution test+}
This indeed tried to write ('st.const') into 's2', which was '.const'
(also: 's1' was '.const') -- even though, no explicit 'const' in
'g++.dg/torture/builtin-clear-padding-1.C'; "interesting".
We progress:
error : Memory space doesn't match for '_ZNSt3tr18__detail12__prime_listE' in 'input file 3 at offset 53085', first specified in 'input file 1 at offset 1924'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into execution test PASS for a few dozens of libstdc++ test cases.
We progress:
error : Memory space doesn't match for '_ZNSt6locale17_S_twinned_facetsE' in 'input file 11 at offset 479903', first specified in 'input file 9 at offset 59300'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into:
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 execution test
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++26 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 execution test
..., and likewise for a few hundreds of libstdc++ test cases.
We progress:
error : Memory space doesn't match for '_ZNSt6locale5_Impl19_S_facet_categoriesE' in 'input file 11 at offset 821962', first specified in 'input file 10 at offset 676317'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into execution test PASS for a hundred of libstdc++ test cases.
We progress:
error : Memory space doesn't match for '_ctype_' in 'input file 22 at offset 1698331', first specified in 'input file 9 at offset 57095'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into execution test PASS for another few libstdc++ test cases.
PR target/119573
gcc/
* config/nvptx/nvptx.cc (nvptx_encode_section_info): Don't set
'DATA_AREA_CONST' for 'TREE_CONSTANT', or 'TREE_READONLY'.
(nvptx_asm_declare_constant_name): Use '.global' instead of
'.const'.
gcc/testsuite/
* gcc.c-torture/compile/pr46534.c: Don't 'dg-skip-if' nvptx.
* gcc.target/nvptx/decl.c: Adjust.
libstdc++-v3/
* config/cpu/nvptx/t-nvptx (AM_MAKEFLAGS): Don't amend.
Thomas Schwinge [Mon, 7 Apr 2025 10:39:33 +0000 (12:39 +0200)]
nvptx: In offloading compilation, special-case certain host-setup symbol aliases: avoid unused label 'emit_ptx_alias' diagnostic
Minor fix-up for commit 65b31b3fff2fced015ded1026733605f34053796
"nvptx: In offloading compilation, special-case certain host-setup symbol aliases [PR101544]",
as of which we see for non-offloading configurations:
+[...]/source-gcc/gcc/config/nvptx/nvptx.cc: In function 'void nvptx_asm_output_def_from_decls(FILE*, tree, tree)':
+[...]/source-gcc/gcc/config/nvptx/nvptx.cc:7769:2: warning: label 'emit_ptx_alias' defined but not used [-Wunused-label]
+ 7769 | emit_ptx_alias:
+ | ^~~~~~~~~~~~~~
* intrinsic.texi (OpenACC Module OPENACC): Adjust version
references to 2.7 from 2.6.
libgomp/ChangeLog:
* libgomp.texi (Enabling OpenACC): Adjust version
references to 2.7 from 2.6.
* openacc.f90 (module openacc): Adjust openacc_version to 201811.
* openacc_lib.h (openacc_version): Adjust openacc_version to 201811.
* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Adjust
test value to 201811.
* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.
OpenMP: Fix append_args handling in modify_call_for_omp_dispatch
At tree level, the addr ref is also required for array dummy arguments,
contrary to C; the GOMP_interop calls in modify_call_for_omp_dispatch
were updated accordingly (using build_fold_addr_expr).
As the GOMP_interop calls had no location data associated with them,
the init call happened as soon as executing the previous line of code,
which was confusing; solution: use the location data of the function
call itself.
PR middle-end/119662
gcc/ChangeLog:
* gimplify.cc (modify_call_for_omp_dispatch): Fix GOMP_interop
arg passing; add location info to function calls.
libgomp/ChangeLog:
* testsuite/libgomp.c/append-args-fr-1.c: New test.
* testsuite/libgomp.c/append-args-fr.h: New test.
Jonathan Wakely [Mon, 7 Apr 2025 18:52:55 +0000 (19:52 +0100)]
libstdc++: Fix use-after-free in std::format [PR119671]
When formatting floating-point values to wide strings there's a case
where we invalidate a std::wstring buffer while a std::wstring_view is
still referring to it.
libstdc++-v3/ChangeLog:
PR libstdc++/119671
* include/std/format (__formatter_fp::format): Do not invalidate
__wstr unless _M_localized returns a valid string.
* testsuite/std/format/functions/format.cc: Check wide string
formatting of floating-point types with classic locale.
Jonathan Wakely [Wed, 26 Mar 2025 11:47:05 +0000 (11:47 +0000)]
libstdc++: Replace use of __mindist in ranges::uninitialized_xxx algos [PR101587]
In r15-8980-gf4b6acfc36fb1f I introduced a new function object for
finding the smaller of two distances. In bugzilla Hewill Kang pointed
out that we still need to explicitly convert the result back to the
right difference type, because the result might be an integer-like class
type that doesn't convert to an integral type explicitly.
Rather than doing that conversion in the __mindist function object, I
think it's simpler to remove it again and just do a comparison and
assignment. We always want the result to have a specific type, so we can
just check if the value of the other type is smaller, and then convert
that to the other type if so.
libstdc++-v3/ChangeLog:
PR libstdc++/101587
* include/bits/ranges_uninitialized.h (__detail::__mindist):
Remove.
(ranges::uninitialized_copy, ranges::uninitialized_copy_n)
(ranges::uninitialized_move, ranges::uninitialized_move_n): Use
comparison and assignment instead of __mindist.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc:
Check with ranges that use integer-like class type for
difference type.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.
Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com> Reviewed-by: Hewill Kang <hewillk@gmail.com>
(cherry picked from commit 03ac8886e5c1fa16da90276fd721a57fa9435f4f)
Jonathan Wakely [Wed, 26 Mar 2025 11:47:05 +0000 (11:47 +0000)]
libstdc++: Replace use of std::min in ranges::uninitialized_xxx algos [PR101587]
Because ranges can have any signed integer-like type as difference_type,
it's not valid to use std::min(diff1, diff2). Instead of calling
std::min with an explicit template argument, this adds a new __mindist
helper that determines the common type and uses that with std::min.
libstdc++-v3/ChangeLog:
PR libstdc++/101587
* include/bits/ranges_uninitialized.h (__detail::__mindist):
New function object.
(ranges::uninitialized_copy, ranges::uninitialized_copy_n)
(ranges::uninitialized_move, ranges::uninitialized_move_n): Use
__mindist instead of std::min.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc:
Check ranges with difference difference types.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.
LoongArch: Add LoongArch architecture detection to __float128 support in libgfortran and libquadmath [PR119408].
In GCC14, LoongArch added __float128 as an alias for _Float128.
In commit r15-8962, support for q/Q suffixes for 128-bit floating point
numbers. This will cause the compiler to automatically link libquadmath
when compiling Fortran programs. But on LoongArch `long double` is
IEEE quad, so there is no need to implement libquadmath.
This causes link failure.
PR target/119408
libgfortran/ChangeLog:
* acinclude.m4: When checking for __float128 support, determine
whether the current architecture is LoongArch. If so, return false.
* configure: Regenerate.
libquadmath/ChangeLog:
* configure.ac: When checking for __float128 support, determine
whether the current architecture is LoongArch. If so, return false.
* configure: Regenerate.
Sigend-off-by: Xi Ruoyao <xry111@xry111.site> Sigend-off-by: Jakub Jelinek <jakub@redhat.com>
(cherry picked from commit 1534f0099c98ea14c08a401302b05edf2231f411)
Patrick Palka [Thu, 13 Mar 2025 23:55:00 +0000 (19:55 -0400)]
libstdc++: Work around C++20 tuple<tuple<any>> constraint recursion [PR116440]
The type tuple<tuple<any>> is clearly copy/move constructible, but for
reasons that are not yet completely understood checking this triggers
constraint recursion with our C++20 tuple implementation (but not the
C++17 implementation).
It turns out this recursion stems from considering the non-template
tuple(const _Elements&) constructor during the copy/move constructibility
check. Considering this constructor is ultimately redundant, since the
defaulted copy/move constructors are better matches.
GCC has a non-standard "perfect candidate" optimization[1] that causes
overload resolution to shortcut considering template candidates if we
find a (non-template) perfect candidate. So to work around this issue
(and as a general compile-time optimization) this patch turns the
problematic constructor into a template so that GCC doesn't consider it
when checking for copy/move constructibility of this tuple type.
Changing the template-ness of a constructor can affect overload
resolution (since template-ness is a tiebreaker) so there's a risk this
change could e.g. introduce overload resolution ambiguities. But the
original C++17 implementation has long defined this constructor as a
template (in order to constrain it etc), so doing the same thing in the
C++20 mode should naturally be quite safe.
The testcase still fails with Clang (in C++20 mode) since it doesn't
implement said optimization.
Jason Merrill [Mon, 7 Apr 2025 15:49:19 +0000 (11:49 -0400)]
c++: constinit and value-initialization [PR119652]
Value-initialization built an AGGR_INIT_EXPR to set AGGR_INIT_ZERO_FIRST on.
Passing that AGGR_INIT_EXPR to maybe_constant_value returned a TARGET_EXPR,
which potential_constant_expression_1 mistook for a temporary.
We shouldn't add a TARGET_EXPR to the AGGR_INIT_EXPR in this case, just like
we already avoid adding it to CONSTRUCTOR or CALL_EXPR.
PR c++/119652
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_outermost_constant_expr): Also don't add a
TARGET_EXPR around AGGR_INIT_EXPR.
Jason Merrill [Fri, 4 Apr 2025 21:34:08 +0000 (17:34 -0400)]
c++: __FUNCTION__ in lambda return type [PR118629]
In this testcase, the use of __FUNCTION__ is within a function parameter
scope, the lambda's. And P1787 changed __func__ to live in the parameter
scope. But [basic.scope.pdecl] says that the point of declaration of
__func__ is immediately before {, so in the trailing return type it isn't in
scope yet, so this __FUNCTION__ should refer to foo().
Looking first for a block scope, then a function parameter scope, gives us
the right result.
PR c++/118629
gcc/cp/ChangeLog:
* name-lookup.cc (pushdecl_outermost_localscope): Look for an
sk_block.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-__func__3.C: New test.
Jason Merrill [Wed, 29 Jan 2025 10:15:00 +0000 (05:15 -0500)]
c++: lambda in requires outside template [PR99546]
Since r10-7441 we set processing_template_decl in a requires-expression so
that we can use tsubst_expr to evaluate the requirements, but that confuses
lambdas terribly; begin_lambda_type silently returns error_mark_node and we
continue into other failures. This patch clears processing_template_decl
again while we're defining the closure and op() function, so it only remains
set while parsing the introducer (i.e. any init-captures) and building the
resulting object. This properly avoids trying to create another lambda in
tsubst_lambda_expr.
* g++.dg/cpp2a/lambda-requires2.C: New test.
* g++.dg/cpp2a/lambda-requires3.C: New test.
* g++.dg/cpp2a/lambda-requires4.C: New test.
* g++.dg/cpp2a/lambda-requires5.C: New test.
Patrick Palka [Fri, 4 Apr 2025 18:03:58 +0000 (14:03 -0400)]
c++: constraint variable used in evaluated context [PR117849]
Here we wrongly reject the type-requirement at parse time due to its use
of the constraint variable 't' within a template argument (an evaluated
context). Fix this simply by refining the "use of parameter outside
function body" error path to exclude constraint variables.
PR c++/104255 tracks the same issue for function parameters, but fixing
that would be more involved, requiring changes to the PARM_DECL case of
tsubst_expr.
PR c++/117849
gcc/cp/ChangeLog:
* semantics.cc (finish_id_expression_1): Allow use of constraint
variable outside an unevaluated context.
Patrick Palka [Thu, 3 Apr 2025 20:33:46 +0000 (16:33 -0400)]
c++: P2280R4 and speculative constexpr folding [PR119387]
Compiling the testcase in this PR uses 2.5x more memory and 6x more
time ever since r14-5979 which implements P2280R4. This is because
our speculative constexpr folding now does a lot more work trying to
fold ultimately non-constant calls to constexpr functions, and in turn
produces a lot of garbage. We do sometimes successfully fold more
thanks to P2280R4, but it seems to be trivial stuff like calls to
std::array::size or std::addressof. The benefit of P2280 therefore
doesn't seem worth the cost during speculative constexpr folding, so
this patch restricts the paper to only manifestly-constant evaluation.
PR c++/119387
gcc/cp/ChangeLog:
* constexpr.cc (p2280_active_p): New.
(cxx_eval_constant_expression) <case VAR_DECL>: Use it to
restrict P2280 relaxations.
<case PARM_DECL>: Likewise.
Eric Botcazou [Fri, 4 Apr 2025 09:45:23 +0000 (11:45 +0200)]
Ada: Fix thinko in Eigensystem for complex Hermitian matrices
The implementation solves the eigensystem for a NxN complex Hermitian matrix
by first solving it for a 2Nx2N real symmetric matrix and then interpreting
the 2Nx1 real vectors as Nx1 complex ones, but the last step does not work.
The patch fixes the last step and also performs a small cleanup throughout
the implementation, mostly in the commentary and without functional changes.
gcc/ada/
* libgnat/a-ngcoar.adb (Eigensystem): Adjust notation and fix the
layout of the real symmetric matrix in the main comment. Adjust
the layout of the associated code accordingly and correctly turn
the 2Nx1 real vectors into Nx1 complex ones.
(Eigenvalues): Minor similar tweaks.
* libgnat/a-ngrear.adb (Jacobi): Minor tweaks in the main comment.
Adjust notation and corresponding parameter names of functions.
Fix call to Unit_Matrix routine. Adjust the comment describing
the various kinds of iterations to match the implementation.
Using specific SSA names in pattern matching in `dg-final' makes tests
"unstable", in that changes in passes prior to the pass whose dump is
analyzed in the particular test may change the numbering of the SSA
variables, causing the test to start failing spuriously.
We thus switch from specific SSA names to the use of a multi-line
regular expression making use of capture groups for matching particular
variables across different statements, ensuring the test will pass
more consistently across different versions of GCC.
PR testsuite/118597
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-fncall-mask.c: Update test directives.
As noted in PR 118965, the initial interop implementation overlooked
the requirement in the OpenMP spec that at least one of the "target"
and "targetsync" modifiers is required in both the interop construct
init clause and the declare variant append_args clause.
Adding the check was fairly straightforward, but it broke about a
gazillion existing test cases. In particular, things like "init (x, y)"
which were previously accepted (and tested for being accepted) aren't
supposed to be allowed by the spec, much less things like "init (target)"
where target was previously interpreted as a variable name instead of a
modifier. Since one of the effects of the change is that at least one
modifier is always required, I found that deleting all the code that was
trying to detect and handle the no-modifier case allowed for better
diagnostics.
gcc/fortran/ChangeLog
PR middle-end/118965
* openmp.cc (gfc_parser_omp_clause_init_modifiers): Fix some
inconsistent code indentation. Remove code for recognizing
clauses without modifiers. Diagnose prefer_type without a
following paren. Adjust error message for an unrecognized modifier.
Diagnose missing target/targetsync modifier.
(gfc_match_omp_init): Fix more inconsistent code indentation.
Tomasz Kamiński [Thu, 3 Apr 2025 08:23:45 +0000 (10:23 +0200)]
libstdc++: Fix handling of field width for wide strings and characters [PR119593]
This patch corrects handling of UTF-32LE and UTF32-BE in
__unicode::__literal_encoding_is_unicode<_CharT>, so they are
recognized as unicode and functions produces correct result for wchar_t.
Use `__unicode::__field_width` to compute the estimated witdh
of the charcter for unicode wide encoding.
PR libstdc++/119593
libstdc++-v3/ChangeLog:
* include/bits/unicode.h
(__unicode::__literal_encoding_is_unicode<_CharT>):
Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
* include/std/format (__formatter_str::_S_character_width):
Define.
(__formatter_str::_S_character_width): Updated passed char
length.
* testsuite/std/format/functions/format.cc: Test for wchar_t.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Richard Biener [Fri, 7 Mar 2025 09:15:20 +0000 (10:15 +0100)]
tree-optimization/119145 - avoid stray .MASK_CALL after vectorization
When we BB vectorize an if-converted loop body we make sure to not
leave around .MASK_LOAD or .MASK_STORE created by if-conversion but
we failed to check for .MASK_CALL.
PR tree-optimization/119145
* tree-vectorizer.cc (try_vectorize_loop_1): Avoid BB
vectorizing an if-converted loop body when there's a .MASK_CALL
in the loop body.
Richard Biener [Thu, 6 Mar 2025 08:08:07 +0000 (09:08 +0100)]
middle-end/119119 - re-gimplification of empty CTOR assignments
The following testcase runs into a re-gimplification issue during
inlining when processing
MEM[(struct e *)this_2(D)].a = {};
where re-gimplification does not handle assignments in the same
way than the gimplifier but instead relies on rhs_predicate_for
and gimplifying the RHS standalone. This fails to handle
special-casing of CTORs. The is_gimple_mem_rhs_or_call predicate
already handles clobbers but not empty CTORs so we end up in
the fallback code trying to force the CTOR into a separate stmt
using a temporary - but as we have a non-copyable type here that ICEs.
The following generalizes empty CTORs in is_gimple_mem_rhs_or_call
since those need no additional re-gimplification.
PR middle-end/119119
* gimplify.cc (is_gimple_mem_rhs_or_call): All empty CTORs
are OK when not a register type.
When we vectorize a .COND_ADD reduction and apply the single-use-def
cycle optimization we can end up chosing the wrong else value for
subsequent .COND_ADD. The following rectifies this.
PR tree-optimization/119096
* tree-vect-loop.cc (vect_transform_reduction): Use the
correct else value for .COND_fn.
Richard Biener [Mon, 3 Mar 2025 08:54:15 +0000 (09:54 +0100)]
ipa/119067 - bogus TYPE_PRECISION check on VECTOR_TYPE
odr_types_equivalent_p can end up using TYPE_PRECISION on vector
types which is a no-go. The following instead uses TYPE_VECTOR_SUBPARTS
for vector types so we also end up comparing the number of vector elements.
PR ipa/119067
* ipa-devirt.cc (odr_types_equivalent_p): Check
TYPE_VECTOR_SUBPARTS for vectors.
* g++.dg/lto/pr119067_0.C: New testcase.
* g++.dg/lto/pr119067_1.C: Likewise.
We are detecting a cycle as double reduction where the inner loop
cycle has extra out-of-loop uses. This clashes at least with
assumptions from the SLP discovery code which says the cycle
isn't reachable from another SLP instance. It also was not intended
to support this case, in fact with GCC 14 we seem to generate wrong
code here.
PR tree-optimization/119057
* tree-vect-loop.cc (check_reduction_path): Add argument
specifying whether we're analyzing the inner loop of a
double reduction. Do not allow extra uses outside of the
double reduction cycle in this case.
(vect_is_simple_reduction): Adjust.
Richard Biener [Thu, 6 Mar 2025 12:48:16 +0000 (13:48 +0100)]
lto/114501 - missed free-lang-data for CONSTRUCTOR index
The following makes sure to also walk CONSTRUCTOR element indexes
which can be FIELD_DECLs, referencing otherwise unused types we
need to clean. walk_tree only walks CONSTRUCTOR element data.
PR lto/114501
* ipa-free-lang-data.cc (find_decls_types_r): Explicitly
handle CONSTRUCTORs as walk_tree handling of those is
incomplete.
Richard Biener [Fri, 28 Feb 2025 10:44:26 +0000 (11:44 +0100)]
ipa/111245 - bogus modref analysis for store in call that might throw
We currently record a kill for
*x_4(D) = always_throws ();
because we consider the store always executing since the appropriate
check for whether the stmt could throw is guarded by
!cfun->can_throw_non_call_exceptions.
PR ipa/111245
* ipa-modref.cc (modref_access_analysis::analyze_store): Do
not guard the check of whether the stmt could throw by
cfun->can_throw_non_call_exceptions.
Jonathan Wakely [Wed, 26 Mar 2025 11:21:32 +0000 (11:21 +0000)]
libstdc++: Fix std::ranges::iter_move for function references [PR119469]
The result of std::move (or a cast to an rvalue reference) on a function
reference is always an lvalue. Because std::ranges::iter_move was using
the type std::remove_reference_t<X>&& as the result of std::move, it was
giving the wrong type for function references. Use a decltype-specifier
with declval<remove_reference_t<X>>() instead of just using the
remove_reference_t<X>&& type directly. This gives the right result,
while still avoiding the cost of doing overload resolution for
std::move.
libstdc++-v3/ChangeLog:
PR libstdc++/119469
* include/bits/iterator_concepts.h (_IterMove::__result): Use
decltype-specifier instead of an explicit type.
* testsuite/24_iterators/customization_points/iter_move.cc:
Check results for function references.
Jonathan Wakely [Fri, 28 Feb 2025 21:44:41 +0000 (21:44 +0000)]
libstdc++: Fix ranges::iter_move handling of rvalues [PR106612]
The specification for std::ranges::iter_move apparently requires us to
handle types which do not satisfy std::indirectly_readable, for example
with overloaded operator* which behaves differently for different value
categories.
libstdc++-v3/ChangeLog:
PR libstdc++/106612
* include/bits/iterator_concepts.h (_IterMove::__iter_ref_t):
New alias template.
(_IterMove::__result): Use __iter_ref_t instead of
std::iter_reference_t.
(_IterMove::__type): Remove incorrect __dereferenceable
constraint.
(_IterMove::operator()): Likewise. Add correct constraints. Use
__iter_ref_t instead of std::iter_reference_t. Forward parameter
as correct value category.
(iter_swap): Add comments.
* testsuite/24_iterators/customization_points/iter_move.cc: Test
that iter_move is found by ADL and that rvalue arguments are
handled correctly.
Jonathan Wakely [Fri, 28 Mar 2025 15:41:41 +0000 (15:41 +0000)]
libstdc++: Fix -Warray-bounds warning in std::vector<bool> [PR110498]
In this case, we need to tell the compiler that the current size is not
larger than the new size so that all the existing elements can be copied
to the new storage. This avoids bogus warnings about overflowing the new
storage when the compiler can't tell that that cannot happen.
We might as well also hoist the loads of begin() and end() before the
allocation too. All callers will have loaded at least begin() before
calling _M_reallocate.
libstdc++-v3/ChangeLog:
PR libstdc++/110498
* include/bits/vector.tcc (vector<bool, A>::_M_reallocate):
Hoist loads of begin() and end() before allocation and use them
to state an unreachable condition.
* testsuite/23_containers/vector/bool/capacity/110498.cc: New
test.
Jonathan Wakely [Fri, 28 Mar 2025 15:41:41 +0000 (15:41 +0000)]
libstdc++: Fix -Wstringop-overread warning in std::vector<bool> [PR114758]
As in r13-4393-gcca06f0d6d76b0 and a few other commits, we can avoid
bogus warnings in std::vector<bool> by hoisting some loads to before the
allocation that calls operator new. This means that the compiler has
enough info to remove the dead branches that trigger bogus warnings.
On trunk this is only needed with -fno-assume-sane-operators-new-delete
but it will help on the branches where that option doesn't exist.
libstdc++-v3/ChangeLog:
PR libstdc++/114758
* include/bits/vector.tcc (vector<bool, A>::_M_fill_insert):
Hoist loads of begin() and end() before allocation.
* testsuite/23_containers/vector/bool/capacity/114758.cc: New
test.
Jonathan Wakely [Fri, 28 Mar 2025 22:00:38 +0000 (22:00 +0000)]
libstdc++: Fix bogus -Wstringop-overflow in std::vector::insert [PR117983]
This was fixed on trunk by r15-4473-g3abe751ea86e34, but that isn't
suitable for backporting. Instead, just add another unreachable
condition in std::vector::_M_range_insert so the compiler knows this
memcpy doesn't use a length originating from a negative ptrdiff_t
converted to a very positive size_t.
libstdc++-v3/ChangeLog:
PR libstdc++/117983
* include/bits/vector.tcc (vector::_M_range_insert): Add
unreachable condition to tell the compiler begin() <= end().
* testsuite/23_containers/vector/modifiers/insert/117983.cc: New
test.
Jonathan Wakely [Mon, 31 Mar 2025 11:30:44 +0000 (12:30 +0100)]
libstdc++: Fix -Warray-bounds warning in std::vector::resize [PR114945]
This is yet another false positive warning fix. This time the compiler
can't prove that when the vector has sufficient excess capacity to
append new elements, the pointer to the existing storage is not null.
libstdc++-v3/ChangeLog:
PR libstdc++/114945
* include/bits/vector.tcc (vector::_M_default_append): Add
unreachable condition so the compiler knows that _M_finish is
not null.
* testsuite/23_containers/vector/capacity/114945.cc: New test.
Thomas Schwinge [Mon, 31 Mar 2025 07:55:14 +0000 (09:55 +0200)]
GCN: Don't emit weak undefined symbols [PR119369]
This resolves all instances of PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'";
for all affected test cases, the execution test status progresses FAIL -> PASS.
This however also causes a small number of (expected) regressions, very similar
to GCC/nvptx:
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++17 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++26 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++98 (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/attr-weakref-1.c (test for excess errors)
[-FAIL:-]{+UNRESOLVED:+} gcc.dg/attr-weakref-1.c [-execution test-]{+compilation failed to produce executable+}
This fixes a few hundreds of compilation/linking FAILs (similar to PR69506),
where the GCN/LLVM 'ld' reported:
ld: error: relocation R_AMDGPU_REL32_LO cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
>>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
>>> referenced by cow-stdexcept.cc:259 ([...]/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
>>> cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a
ld: error: relocation R_AMDGPU_REL32_HI cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
>>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
>>> referenced by cow-stdexcept.cc:259 ([...]/source-gcc/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
>>> cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a
[...]
..., which is:
$ c++filt _ZGTtnam
transaction clone for operator new[](unsigned long)
..., and similarly for other libitm symbols.
However, the affected test cases, if applicable, then run into execution test
FAILs, due to PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'".
PR target/119369
libstdc++-v3/
* config/cpu/gcn/cpu_defines.h: New.
* configure.host [GCN] (cpu_defines_dir): Point to it.
gcc/testsuite:
PR testsuite/116271
* gcc.dg/vect/tsvc/vect-tsvc-s176.c [TRUNCATE_TEST]: Make sure
that m stays the same as the loop bound of the middle loop.
* gcc.dg/vect/tsvc/tsvc.h (get_expected_result) <s176> [TRUNCATE_TEST]:
Adjust expected value.
Richard Biener [Wed, 5 Mar 2025 13:24:50 +0000 (14:24 +0100)]
debug/101533 - ICE with variant typedef DIE generation
There's a sanity check in gen_type_die_with_usage that trips
unnecessarily for a case where the relevant DIE has already been
generated successfully in other ways. The following keys the
existing TREE_ASM_WRITTEN check on the correct object, honoring
this and does nothing instead of ICEing for the testcase at hand.
PR debug/101533
* dwarf2out.cc (gen_type_die_with_usage): When we have
output the typedef already do nothing for a typedef variant.
Do not set TREE_ASM_WRITTEN on the type.
Richard Biener [Wed, 31 Jul 2024 08:07:45 +0000 (10:07 +0200)]
middle-end/101478 - ICE with degenerate address during gimplification
When we gimplify &MEM[0B + 4] we are re-folding the address in case
types are not canonical which ends up with a constant address that
recompute_tree_invariant_for_addr_expr ICEs on. Properly guard
that call.
PR middle-end/101478
* gimplify.cc (gimplify_addr_expr): Check we still have an
ADDR_EXPR before calling recompute_tree_invariant_for_addr_expr.
Richard Biener [Mon, 17 Feb 2025 14:53:11 +0000 (15:53 +0100)]
tree-optimization/98845 - ICE with tail-merging and DCE/DSE disabled
The following shows that tail-merging will make dead SSA defs live
in paths where it wasn't before, possibly introducing UB or as
in this case, uses of abnormals that eventually fail coalescing
later. The fix is to register such defs for stmt comparison.
PR tree-optimization/98845
* tree-ssa-tail-merge.cc (stmt_local_def): Consider a
def with no uses not local.
* gcc.dg/pr98845.c: New testcase.
* gcc.dg/pr81192.c: Adjust.
Richard Biener [Fri, 28 Feb 2025 13:09:29 +0000 (14:09 +0100)]
lto/91299 - weak definition inlined with LTO
The following fixes a thinko in the handling of interposed weak
definitions which confused the interposition check in
get_availability by setting DECL_EXTERNAL too early.
PR lto/91299
gcc/lto/
* lto-symtab.cc (lto_symtab_merge_symbols): Set DECL_EXTERNAL
only after calling get_availability.
gcc/testsuite/
* gcc.dg/lto/pr91299_0.c: New testcase.
* gcc.dg/lto/pr91299_1.c: Likewise.
Richard Biener [Fri, 28 Feb 2025 09:36:11 +0000 (10:36 +0100)]
tree-optimization/87984 - hard register assignments not preserved
The following disables redundant store elimination to hard register
variables which isn't valid.
PR tree-optimization/87984
* tree-ssa-dom.cc (dom_opt_dom_walker::optimize_stmt): Do
not perform redundant store elimination to hard register
variables.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt):
Likewise.
When the C++ frontend clones a CTOR we do not copy ASM_EXPR constraints
fully as walk_tree does not recurse to TREE_PURPOSE of TREE_LIST nodes.
At this point doing that seems too dangerous so the following instead
avoids gimplification of ASM_EXPRs to clobber the shared constraints
and unshares it there, like it also unshares TREE_VALUE when it
re-writes a "+" output constraint to separate "=" output and matching
input constraint.
PR middle-end/66279
* gimplify.cc (gimplify_asm_expr): Copy TREE_PURPOSE before
rewriting it for "+" processing.
Marek Polacek [Tue, 25 Mar 2025 17:36:24 +0000 (13:36 -0400)]
c++: fix missing lifetime extension [PR119383]
Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR ->
TARGET_EXPR folding not to change its value category. That fix seems
correct but it made us stop extending the lifetime in this testcase,
causing a wrong-code issue -- extend_ref_init_temps_1 did not see
through the extra *& because it doesn't use a tree walk.
This patch reverts r15-8011 and instead handles the problem in
build_over_call by calling force_lvalue in the is_really_empty_class
case as well as in the general case.
PR c++/119383
gcc/cp/ChangeLog:
* call.cc (build_over_call): Use force_lvalue to ensure op= returns
an lvalue.
* cp-tree.h (force_lvalue): Declare.
* cvt.cc (force_lvalue): New.
* typeck.cc (cp_build_indirect_ref_1): Revert r15-8011.
Jonathan Wakely [Tue, 1 Apr 2025 10:02:43 +0000 (11:02 +0100)]
libstdc++: Avoid bogus -Walloc-size-larger-than warning in test [PR116212]
The compiler can't tell that the vector size fits in int, so it thinks
it might overflow to a negative value, which would then be a huge
positive size_t. In reality, the vector size never exceeds five.
There's no warning on trunk, so just change the local variable to use
type unsigned so that we get rid of the warning on the branches.
libstdc++-v3/ChangeLog:
PR libstdc++/116212
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Use unsigned for vector size.
Martin Jambor [Fri, 14 Mar 2025 15:07:01 +0000 (16:07 +0100)]
ipa: Do not modify cgraph edges from thunk clones during inlining (PR116572)
In PR 116572 we hit an assert that a thunk which does not have a body
looks like it has one. It does not, but the call_stmt of its outgoing
edge points to a statement, which should not. In fact it has several
outgoing call graph edges, which cannot be. The problem is that the
code updating the edges to reflect inlining into the master clone (an
ex-thunk, unlike the clone, which is still an unexpanded thunk) is
being updated during inling into the master clone. This patch simply
makes the code to skip unexpanded thunk clones.
gcc/ChangeLog:
2025-03-13 Martin Jambor <mjambor@suse.cz>
PR ipa/116572
* cgraph.cc (cgraph_update_edges_for_call_stmt): Do not update
edges of clones that are unexpanded thunk. Assert that the node
passed as the parameter is not an unexpanded thunk.
Jonathan Wakely [Mon, 10 Mar 2025 14:29:36 +0000 (14:29 +0000)]
libstdc++: Add static_assert to std::packaged_task::packaged_task(F&&)
LWG 4154 (approved in Wrocław, November 2024) fixed the Mandates:
precondition for std::packaged_task::packaged_task(F&&) to match what
the implementation actually requires. We already gave a diagnostic in
the right cases as required by the issue resolution, so strictly
speaking we don't need to do anything. But the current diagnostic comes
from inside the implementation of std::__invoke_r and could be more
user-friendly.
For C++17 (when std::is_invocable_r_v is available) add a static_assert
to the constructor, so the error is clear:
Iain Buclaw [Sat, 29 Mar 2025 22:16:25 +0000 (23:16 +0100)]
d: Fix error with -Warray-bounds and -O2 [PR117002]
The record layout of class types in D don't get any tail padding, so it
is possible for the `classInstanceSize' to not be a multiple of the
`classInstanceAlignment'.
Rather than setting the instance alignment on the underlying
RECORD_TYPE, instead give the type an alignment of 1, which will mark it
as TYPE_PACKED. The value of `classInstanceAlignment' is instead
applied to the DECL_ALIGN of both the static `init' symbol, and the
stack allocated variable used when generating `new' for a `scope' class.
PR d/117002
gcc/d/ChangeLog:
* decl.cc (aggregate_initializer_decl): Set explicit decl alignment of
class instance.
* expr.cc (ExprVisitor::visit (NewExp *)): Likewise.
* types.cc (TypeVisitor::visit (TypeClass *)): Mark the record type of
classes as packed.
Tobias Burnus [Mon, 31 Mar 2025 09:44:26 +0000 (11:44 +0200)]
OpenMP: modify_call_for_omp_dispatch - fix invalid memory access after 'error' [PR119541]
OpenMP requires that the number of dispatch 'interop' clauses (ninterop)
is less or equal to the number of declare variant 'append_args' interop
objects (nappend).
While 'nappend < ninterop' was diagnosed as error, the processing continues,
which lead to an invalid out-of-bounds memory access. Solution: only
process the first nappend 'interop' clauses.
gcc/ChangeLog:
PR middle-end/119541
* gimplify.cc (modify_call_for_omp_dispatch): Limit interop claues
processing by the number of append_args arguments.
Jonathan Wakely [Thu, 6 Mar 2025 21:18:21 +0000 (21:18 +0000)]
libstdc++: Make std::erase for linked lists convert to bool
LWG 4135 (approved in Wrocław, November 2024) fixes the lambda
expressions used by std::erase for std::list and std::forward_list.
Previously they attempted to copy something that isn't required to be
copyable. Instead they should convert it to bool right away.
The issue resolution also changes the lambda's parameter to be const, so
that it can't modify the elements while comparing them.
libstdc++-v3/ChangeLog:
* include/std/forward_list (erase): Change lambda to have
explicit return type and const parameter type.
* include/std/list (erase): Likewise.
* testsuite/23_containers/forward_list/erasure.cc: Check lambda
is correct.
* testsuite/23_containers/list/erasure.cc: Likewise.
Jonathan Wakely [Tue, 25 Mar 2025 13:24:08 +0000 (13:24 +0000)]
libstdc++: Optimize std::vector construction from input iterators [PR108487]
LWG 3291 make std::ranges::iota_view's iterator have input_iterator_tag
as its iterator_category, even though it satisfies the C++20
std::forward_iterator concept. This means that the traditional
std::vector::vector(InputIterator, InputIterator) constructor treats
iota_view iterators as input iterators, because it only understands the
C++17 iterator requirements, not the C++20 iterator concepts. This
results in a loop that calls emplace_back for each individual element of
the iota_view, requiring the vector to reallocate repeatedly as the
values are inserted. This makes it unnecessarily slow to construct a
vector from an iota_view.
This change adds a new _M_range_initialize_n function for initializing a
vector from a range (which doesn't have to be common) and a size. This
new function can be used by vector(InputIterator, InputIterator) when
std::ranges::distance can be used to get the size. It can also be used
by the _M_range_initialize overload that gets the size for a
Cpp17ForwardIterator pair using std::distance, and by the
vector(initializer_list) constructor.
With this new function constructing a std::vector from iota_view does
a single allocation of the correct size and so doesn't need to
reallocate in a loop.
libstdc++-v3/ChangeLog:
PR libstdc++/108487
* include/bits/stl_vector.h (vector(initializer_list)): Call
_M_range_initialize_n instead of _M_range_initialize.
(vector(InputIterator, InputIterator)): Use _M_range_initialize_n
for C++20 sized sentinels and forward iterators.
(vector::_M_range_initialize(FwIt, FwIt, forward_iterator_tag)):
Use _M_range_initialize_n.
(vector::_M_range_initialize_n): New function.
* testsuite/23_containers/vector/cons/108487.cc: New test.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/initlist-opt1.C: Match _M_range_initialize_n
instead of _M_range_initialize.
* g++.dg/tree-ssa/initlist-opt2.C: Likewise.
Jonathan Wakely [Fri, 12 Jan 2024 16:57:41 +0000 (16:57 +0000)]
libstdc++: Update tzdata to 2025a
Import the new 2025a tzdata.zi file. The leapseconds file was also
updated to have a new expiry (no new leap seconds were added).
libstdc++-v3/ChangeLog:
* include/std/chrono (__detail::__get_leap_second_info): Update
expiry date for leap seconds list.
* src/c++20/tzdata.zi: Import new file from 2025a release.
* src/c++20/tzdb.cc (tzdb_list::_Node::_S_read_leap_seconds)
Update expiry date for leap seconds list.
Chung-Lin Tang [Sun, 30 Mar 2025 21:04:45 +0000 (21:04 +0000)]
OpenACC: array reductions bug fixes
This is a merge of the v4 to v5 diff patch from:
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679682.html
This patch fixes issues found for NVPTX sm_70 testing, and another issue
related to copying to reduction buffer for worker/vector mode.
gcc/ChangeLog:
* config/gcn/gcn-tree.cc (gcn_goacc_reduction_setup): Fix array case
copy source into reduction buffer.
* config/nvptx/nvptx.cc (nvptx_expand_shared_addr): Move default size
init setting place.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_WARPSYNC.
(nvptx_init_builtins): Add DEF() of nvptx_builtin_bar_warpsync.
(nvptx_expand_builtin): Expand NVPTX_BUILTIN_BAR_WARPSYNC.
(nvptx_goacc_reduction_setup): Fix array case copy source into reduction
buffer.
(nvptx_goacc_reduction_fini): Add bar.warpsync for at end of vector-mode
reductions for sm_70 and above.
Martin Uecker [Sat, 23 Nov 2024 07:04:05 +0000 (08:04 +0100)]
Fix type compatibility for types with flexible array member 2/2 [PR113688,PR114713,PR117724]
For checking or computing TYPE_CANONICAL, ignore the array size when it is
the last element of a structure or union. To not get errors because of
an inconsistent number of members, zero-sized arrays which are the last
element are not ignored anymore when checking the fields of a struct.