Jakub Jelinek [Fri, 4 Apr 2025 18:52:41 +0000 (20:52 +0200)]
tailc: Don't reject all tail calls if param has addr taken [PR119616]
Before my PR119376 r15-9145 changes, suitable_for_tail_call_opt_p would
return the same value in the same caller, regardless of the calls in it.
If it fails, the caller clears opt_tailcalls which is a reference and
therefore shared by all calls in the caller and we only do tail recursion,
all non-recursive or tail recursion non-optimizable calls are not
tail call optimized.
For musttail calls we want to allow address taken parameters, but the
r15-9145 change effectively resulted in the behavior where if there
are just musttail calls considered, they will be tail call optimized,
and if there are also other tail call candidates (without musttail),
we clear opt_tailcall and then error out on all the musttail calls.
The following patch fixes that by moving the address taken parameter
discovery from suitable_for_tail_call_opt_p to its single caller.
If there are addressable parameters, if !cfun->has_musttail it will
work as before, disable all tail calls in the caller but possibly
allow tail recursions. If cfun->has_musttail, it will set a new
bool automatic flag and reject non-tail recursions. This way musttail
calls can be still accepted and normal tail call candidates rejected
(and tail recursions accepted).
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119616
* tree-tailcall.cc (suitable_for_tail_call_opt_p): Move checking
for addressable parameters from here ...
(find_tail_calls): ... here. If cfun->has_musttail, don't clear
opt_tailcalls for it, instead set a local flag and punt if we can't
tail recurse optimize it.
Jakub Jelinek [Fri, 4 Apr 2025 18:51:50 +0000 (20:51 +0200)]
cfgrtl: Remove REG_EH_REGION notes from tail calls [PR119613]
In PR119491 r15-9154 I've allowed some useless EH regions for musttail
calls (if there are no non-debug/clobber stmts before resx which resumes
external throwing).
Now, for -O1+ (but not -O0/-Og) there is a cleanup_eh pass after it
which should optimize that way.
The following testcase ICEs at -O0 though, the cleanup_eh in that case
is before the musttail pass and dunno why it didn't actually optimize
it away.
The following patch catches that during expansion and just removes the note,
which causes EH cleanups to do the rest. A tail call, even when it throws,
will not throw while the musttail caller's frame is still on the stack,
will throw after that and so REG_EH_REGION for it is irrelevant (like it
would be never set before the r15-9154 changes).
Iain Sandoe [Fri, 21 Mar 2025 10:22:58 +0000 (10:22 +0000)]
libgcobol: Check if the target needs libm.
Use the libtool config check and $(LIBM).
libgcobol/ChangeLog:
* Makefile.am: Use $(LIBM) to add the math lib when
it is needed.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check if the target wants libm.
Bob Dubner [Fri, 4 Apr 2025 17:48:58 +0000 (13:48 -0400)]
cobol: Eliminate cobolworx UAT errors when compiling with -Os
Testcases compiled with -Os were failing because static functions and static
variables were being optimized away, because of improper data type casts, and
because strict aliasing (whatever that is) was resulting in some loss of data.
These changes eliminate those known problems.
gcc/cobol
* cobol1.cc: (cobol_langhook_post_options): Implemented in order to set
flag_strict_aliasing to zero.
* genapi.cc: (set_user_status): Add comment.
(parser_intrinsic_subst): Expand SHOW_PARSE information.
(psa_global): Change names of return-code and upsi globals,
(psa_FldLiteralA): Set DECL_PRESERVE_P for FldLiteralA.
* gengen.cc: (show_type): Add POINTER type.
(gg_define_function_with_no_parameters): Set DECL_PRESERVE_P for COBOL-
style nested programs. (gg_array_of_bytes): Fix bad cast.
libgcobol
* charmaps.h: Change __gg__data_return_code to 'short' type.
* constants.cc: Likewise.
Jakub Jelinek [Fri, 4 Apr 2025 18:07:37 +0000 (20:07 +0200)]
rtlanal, i386: Adjust pattern_cost and x86 constant cost [PR115910]
Below is an attempt to fix up RTX costing P1 caused by r15-775
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/thread.html#652446
@@ -21562,7 +21562,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
if (x86_64_immediate_operand (x, VOIDmode))
*total = 0;
else
- *total = 1;
+ /* movabsq is slightly more expensive than a simple instruction. */
+ *total = COSTS_N_INSNS (1) + 1;
return true;
case CONST_DOUBLE:
change. In my understanding this was partially trying to workaround
weird code in pattern_cost, which uses
return cost > 0 ? cost : COSTS_N_INSNS (1);
That doesn't make sense to me. All costs smaller than COSTS_N_INSNS (1)
mean we need to have at least one instruction there which has the
COSTS_N_INSNS (1) minimal cost. So special casing just cost 0 for the
really cheap immediates which can be used pretty much everywhere but not
ones which have just tiny bit larger cost than that (1, 2 or 3) is just
weird.
So, the following patch changes that to MAX (COSTS_N_INSNS (1), cost)
which doesn't have this weird behavior where set_src_cost 0 is considered
more expensive than set_src_cost 1.
Note, pattern_cost isn't the only spot where costs are computed and normally
we often sum the subcosts of different parts of a pattern or just query
rtx costs of different parts of subexpressions, so the jump from
1 to 5 is quite significant.
Additionally, x86_64 doesn't have just 2 kinds of constants with different
costs, it has 3, signed 32-bit ones are the ones which can appear in
almost all instructions and so using cost of 0 for those looks best,
then unsigned 32-bit ones which can be done with still cheap movl
instruction (and I think some others too) and finally full 64-bit ones
which can be done only with a single movabsq instruction and are quite
costly both in instruction size and even more expensive to execute.
The following patch attempts to restore the behavior of GCC 14 with the
pattern_cost hunk fixed for the unsigned 32-bit ones and only keeps the
bigger cost for the 64-bit ones.
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR target/115910
* rtlanal.cc (pattern_cost): Return at least COSTS_N_INSNS (1)
rather than just COSTS_N_INTNS (1) for cost <= 0.
* config/i386/i386.cc (ix86_rtx_costs): Set *total to 1 for
TARGET_64BIT x86_64_zext_immediate_operand constants.
Patrick Palka [Fri, 4 Apr 2025 18:03:58 +0000 (14:03 -0400)]
c++: constraint variable used in evaluated context [PR117849]
Here we wrongly reject the type-requirement at parse time due to its use
of the constraint variable 't' within a template argument (an evaluated
context). Fix this simply by refining the "use of parameter outside
function body" error path to exclude constraint variables.
PR c++/104255 tracks the same issue for function parameters, but fixing
that would be more involved, requiring changes to the PARM_DECL case of
tsubst_expr.
PR c++/117849
gcc/cp/ChangeLog:
* semantics.cc (finish_id_expression_1): Allow use of constraint
variable outside an unevaluated context.
Andrew Pinski [Thu, 3 Apr 2025 01:18:50 +0000 (18:18 -0700)]
always turn return into __builtin_unreachable for noreturn fuctions [PR119599]
r8-3988-g356fcc67fba52b added code to turn return statements into __builtin_unreachable
calls inside noreturn functions but only while optimizing. Since -funreachable-traps
was added (r13-1204-gd68d3664253696), it is a good idea to move over to using
__builtin_unreachable (and the trap version with this option which defaults at -O0 and -0g)
instead of just a follow through even at -O0.
This also fixes a regression when inlining a noreturn function that returns at -O0 (due to always_inline)
as we would get an empty bb which has no successor edge instead of one with a call to __builtin_unreachable.
I also noticed there was no testcase testing the warning about __builtin_return inside a noreturn function
so I added a testcase there.
Bootstrapped and tested on x86_64-linux-gnu.
PR ipa/119599
gcc/ChangeLog:
* tree-cfg.cc (pass_warn_function_return::execute): Turn return statements always
into __builtin_unreachable calls.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr119599-1.c: New test.
* gcc.dg/builtin-apply5.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek [Fri, 4 Apr 2025 15:27:56 +0000 (17:27 +0200)]
c++, libcpp: Allow some left shifts in the preprocessor [PR119391]
The libcpp left shift handling implements (partially) the C99-C23
wording where shifts are UB if shift count is negative, or too large,
or shifting left a negative value or shifting left non-negative value
results in something not representable in the result type (in the
preprocessor case that is intmax_t).
libcpp actually implements left shift by negative count as right shifts
by negation of the count and similarly right shifts by negative count
as left shifts by negation (not ok), sets overflow for too large shift
count (ok), doesn't check for negative values on left shift (not ok)
and checks correctly for the non-representable ones otherwise (ok).
Now, C++11 to C++17 has different behavior, whereas in C99-C23 1 << 63
in preprocessor is invalid, in C++11-17 it is valid, but 3 << 63 is
not. The wording is that left shift of negative value is UB (like in C)
and signed non-negative left shift is UB if the result isn't representable
in corresponding unsigned type (so uintmax_t for libcpp).
And then C++20 and newer says all left shifts are well defined with the
exception of bad shift counts.
In -fsanitize=undefined we handle these by
/* For signed x << y, in C99 and later, the following:
(unsigned) x >> (uprecm1 - y)
if non-zero, is undefined. */
and
/* For signed x << y, in C++11 to C++17, the following:
x < 0 || ((unsigned) x >> (uprecm1 - y))
if > 1, is undefined. */
Now, we are late in GCC 15 development, so I think making the preprocessor
more strict than it is now is undesirable, so will defer setting overflow
flag for the shifts by negative count, or shifts by negative value left.
The following patch just makes some previously incorrectly rejected or
warned cases valid for C++11-17 and even more for C++20 and later.
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/119391
* expr.cc (num_lshift): Add pfile argument. Don't set num.overflow
for !num.unsignedp in C++20 or later unless n >= precision. For
C++11 to C++17 set it if orig >> (precision - 1 - n) as logical
shift results in value > 1.
(num_binary_op): Pass pfile to num_lshift.
(num_div_op): Likewise.
arm: testsuite: restore dg-do-what-default in mve.exp
On Arm, running
make check-gcc RUNTESTFLAGS="dwarf2.exp=pr43190.c"
with a target list of "arm-qemu{,-mthumb}"
results in no errors. But running it with
make check-gcc RUNTESTFLAGS="{mve,dwarf2}.exp=pr43190.c"
results in unresolved tests while running the thumb variant. The problem
is that mve.exp is changing dg-do-what-default to "assemble", but failing
to restore the original value once its tests are complete. The result is
that all subsequent tests run with an incorrect underlying default value.
The fix is easy - save dg-do-what-default and restore it after the tests
are complete.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve.exp: Save dg-do-what-default before
changing it. Restore it once done.
Jonathan Wakely [Wed, 26 Mar 2025 11:47:05 +0000 (11:47 +0000)]
libstdc++: Replace use of __mindist in ranges::uninitialized_xxx algos [PR101587]
In r15-8980-gf4b6acfc36fb1f I introduced a new function object for
finding the smaller of two distances. In bugzilla Hewill Kang pointed
out that we still need to explicitly convert the result back to the
right difference type, because the result might be an integer-like class
type that doesn't convert to an integral type explicitly.
Rather than doing that conversion in the __mindist function object, I
think it's simpler to remove it again and just do a comparison and
assignment. We always want the result to have a specific type, so we can
just check if the value of the other type is smaller, and then convert
that to the other type if so.
libstdc++-v3/ChangeLog:
PR libstdc++/101587
* include/bits/ranges_uninitialized.h (__detail::__mindist):
Remove.
(ranges::uninitialized_copy, ranges::uninitialized_copy_n)
(ranges::uninitialized_move, ranges::uninitialized_move_n): Use
comparison and assignment instead of __mindist.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc:
Check with ranges that use integer-like class type for
difference type.
* testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.
Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com> Reviewed-by: Hewill Kang <hewillk@gmail.com>
Tomasz Kamiński [Thu, 3 Apr 2025 15:22:39 +0000 (17:22 +0200)]
libstdc++: Provide formatter for vector<bool>::reference [PR109162]
This patch implement formatter for vector<bool>::reference which
is part of P2286R8.
To indicate partial support we define __glibcxx_format_ranges macro
value 1, without defining __cpp_lib_format_ranges.
To avoid including the whole content of the <format> header, we
introduce new bits/formatfwd.h forward declares classes required
for newly introduce formatter.
The signatures of the user-facing parse and format method of the provided
formatters deviate from the standard by constraining types of params:
* _Bit_reference instead T satisfying is-vector-bool-reference<T>
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types, which leads
to formattable<vector<bool>::reference, char32_t> (and any other type as char)
being true.
PR libstdc++/109162
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add bits/formatfwd.h.
* include/Makefile.in: Add bits/formatfwd.h.
* include/bits/version.def: Define __glibcxx_format_ranges without
corresponding std name.
* include/bits/version.h: Regenerate.
* include/std/format (basic_format_context, __format::__char):
Move declartions to bits/formatfwd.h.
(formatter<_Tp, _CharT>): Remove default argument for _CharT
parameter, now specified in forward declaration in bits/formatfwd.h.
* include/std/vector (formatter<_Bit_reference, _CharT>): Define.
* include/bits/formatfwd.h: New file with forward declarations
for bits of std/format.
* testsuite/23_containers/vector/bool/format.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Eric Botcazou [Fri, 4 Apr 2025 09:45:23 +0000 (11:45 +0200)]
Ada: Fix thinko in Eigensystem for complex Hermitian matrices
The implementation solves the eigensystem for a NxN complex Hermitian matrix
by first solving it for a 2Nx2N real symmetric matrix and then interpreting
the 2Nx1 real vectors as Nx1 complex ones, but the last step does not work.
The patch fixes the last step and also performs a small cleanup throughout
the implementation, mostly in the commentary and without functional changes.
gcc/ada/
* libgnat/a-ngcoar.adb (Eigensystem): Adjust notation and fix the
layout of the real symmetric matrix in the main comment. Adjust
the layout of the associated code accordingly and correctly turn
the 2Nx1 real vectors into Nx1 complex ones.
(Eigenvalues): Minor similar tweaks.
* libgnat/a-ngrear.adb (Jacobi): Minor tweaks in the main comment.
Adjust notation and corresponding parameter names of functions.
Fix call to Unit_Matrix routine. Adjust the comment describing
the various kinds of iterations to match the implementation.
Jonathan Wakely [Thu, 3 Apr 2025 12:59:14 +0000 (13:59 +0100)]
libstdc++: Check feature test macro for std::string_view in <string>
We can use the __glibcxx_string_view macro to guard the uses of
std::string_view in <string>, instead of just checking the value of
__cplusplus. It makes no practical difference because
__glibcxx_string_view is defined for C++17 and up, but it makes it clear
to readers that the lines guarded by that macro are features that depend
on string_view.
We could be more precise and check __glibcxx_string_view >= 201606L
which is the value for the P0254R2 paper that integrated
std::string_view with std::string, but I think just checking for the
macro being defined is clear enough.
We can also check __glibcxx_variant for the _Never_valueless_alt partial
specialization.
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h: Check __glibcxx_string_view and
__glibcxx_variant instead of __cplusplus >= 2017L.
* include/bits/cow_string.h: Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Fri, 4 Apr 2025 06:59:51 +0000 (08:59 +0200)]
tailc: Use the IPA-VRP tail call hack even for pointers [PR119614]
As the first two testcases show, even with pointers IPA-VRP can optimize
return values from functions if they have singleton ranges into just the
exact value, so we need to virtually undo that for tail calls similarly
to integers and floats. The third test just adds check that it works
even with floats (which it does).
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119614
* tree-tailcall.cc (find_tail_calls): Handle also pointer types in the
IPA-VRP workaround.
* c-c++-common/pr119614-1.c: New test.
* c-c++-common/pr119614-2.c: New test.
* c-c++-common/pr119614-3.c: New test.
Thomas Schwinge [Wed, 2 Apr 2025 08:25:17 +0000 (10:25 +0200)]
nvptx: Don't use PTX '.const', constant state space [PR119573]
This avoids cases where a "File uses too much global constant data" (final
executable, or single object file), and avoids cases of wrong code generation:
"error : State space incorrect for instruction 'st'" ('st.const'), or another
case where an "illegal instruction was encountered", or a lot of cases where
for two compilation units (such as a library linked with user code) we ran into
"error : Memory space doesn't match" due to differences in '.const' usage
between definition and use of a variable.
We progress:
ptxas error : File uses too much global constant data (0x1f01a bytes, 0x10000 max)
nvptx-run: cuLinkAddData failed: a PTX JIT compilation failed (CUDA_ERROR_INVALID_PTX, 218)
... into:
PASS: 20_util/to_chars/103955.cc -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} 20_util/to_chars/103955.cc -std=gnu++17 execution test
We progress:
ptxas error : File uses too much global constant data (0x36c65 bytes, 0x10000 max)
nvptx-as: ptxas returned 255 exit status
... into:
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O0 {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O1 {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O2 {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -O3 -g {+(test for excess errors)+}
[-UNSUPPORTED:-]{+PASS:+} gcc.c-torture/compile/pr46534.c -Os {+(test for excess errors)+}
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O0 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O1 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O2 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -O3 -g (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/torture/pr31863.C -Os (test for excess errors)
[-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-1.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-1.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-4.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-4.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} gfortran.dg/bind-c-contiguous-5.f90 -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/bind-c-contiguous-5.f90 -O0 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} 20_util/to_chars/double.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} 20_util/to_chars/double.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} 20_util/to_chars/float.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} 20_util/to_chars/float.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} special_functions/13_ellint_3/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} special_functions/13_ellint_3/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/14_ellint_3/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/14_ellint_3/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
..., and progress likewise, but fail later with an unrelated error:
[-FAIL:-]{+PASS:+} ext/special_functions/hyperg/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+FAIL:+} ext/special_functions/hyperg/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[...]/libstdc++-v3/testsuite/ext/special_functions/hyperg/check_value.cc:12317: void test(const testcase_hyperg<Ret> (&)[Num], Ret) [with Ret = double; unsigned int Num = 19]: Assertion 'max_abs_frac < toler' failed.
..., and:
[-FAIL:-]{+PASS:+} tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+FAIL:+} tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[...]/libstdc++-v3/testsuite/tr1/5_numerical_facilities/special_functions/17_hyperg/check_value.cc:12316: void test(const testcase_hyperg<Ret> (&)[Num], Ret) [with Ret = double; unsigned int Num = 19]: Assertion 'max_abs_frac < toler' failed.
We progress:
nvptx-run: error getting kernel result: an illegal instruction was encountered (CUDA_ERROR_ILLEGAL_INSTRUCTION, 715)
... into:
PASS: g++.dg/cpp1z/inline-var1.C -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C -std=gnu++17 execution test
PASS: g++.dg/cpp1z/inline-var1.C -std=gnu++20 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C -std=gnu++20 execution test
PASS: g++.dg/cpp1z/inline-var1.C -std=gnu++26 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/cpp1z/inline-var1.C -std=gnu++26 execution test
(A lot of '.const' -> '.global' etc. Haven't researched what the actual
problem was.)
We progress:
ptxas /tmp/cc5TSZZp.o, line 142; error : State space incorrect for instruction 'st'
ptxas /tmp/cc5TSZZp.o, line 174; error : State space incorrect for instruction 'st'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
... into:
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O0 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O0 [-compilation failed to produce executable-]{+execution test+}
PASS: g++.dg/torture/builtin-clear-padding-1.C -O1 (test for excess errors)
PASS: g++.dg/torture/builtin-clear-padding-1.C -O1 execution test
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O2 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O2 [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O3 -g (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -O3 -g [-compilation failed to produce executable-]{+execution test+}
[-FAIL:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -Os (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/torture/builtin-clear-padding-1.C -Os [-compilation failed to produce executable-]{+execution test+}
This indeed tried to write ('st.const') into 's2', which was '.const'
(also: 's1' was '.const') -- even though, no explicit 'const' in
'g++.dg/torture/builtin-clear-padding-1.C'; "interesting".
We progress:
error : Memory space doesn't match for '_ZNSt3tr18__detail12__prime_listE' in 'input file 3 at offset 53085', first specified in 'input file 1 at offset 1924'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into execution test PASS for a few dozens of libstdc++ test cases.
We progress:
error : Memory space doesn't match for '_ZNSt6locale17_S_twinned_facetsE' in 'input file 11 at offset 479903', first specified in 'input file 9 at offset 59300'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into:
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++17 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++17 execution test
PASS: g++.dg/tree-ssa/pr20458.C -std=gnu++26 (test for excess errors)
[-FAIL:-]{+PASS:+} g++.dg/tree-ssa/pr20458.C -std=gnu++26 execution test
..., and likewise for a few hundreds of libstdc++ test cases.
We progress:
error : Memory space doesn't match for '_ZNSt6locale5_Impl19_S_facet_categoriesE' in 'input file 11 at offset 821962', first specified in 'input file 10 at offset 676317'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into execution test PASS for a hundred of libstdc++ test cases.
We progress:
error : Memory space doesn't match for '_ctype_' in 'input file 22 at offset 1698331', first specified in 'input file 9 at offset 57095'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
... into execution test PASS for another few libstdc++ test cases.
PR target/119573
gcc/
* config/nvptx/nvptx.cc (nvptx_encode_section_info): Don't set
'DATA_AREA_CONST' for 'TREE_CONSTANT', or 'TREE_READONLY'.
(nvptx_asm_declare_constant_name): Use '.global' instead of
'.const'.
gcc/testsuite/
* gcc.c-torture/compile/pr46534.c: Don't 'dg-skip-if' nvptx.
* gcc.target/nvptx/decl.c: Adjust.
libstdc++-v3/
* config/cpu/nvptx/t-nvptx (AM_MAKEFLAGS): Don't amend.
Patrick Palka [Thu, 3 Apr 2025 20:33:46 +0000 (16:33 -0400)]
c++: P2280R4 and speculative constexpr folding [PR119387]
Compiling the testcase in this PR uses 2.5x more memory and 6x more
time ever since r14-5979 which implements P2280R4. This is because
our speculative constexpr folding now does a lot more work trying to
fold ultimately non-constant calls to constexpr functions, and in turn
produces a lot of garbage. We do sometimes successfully fold more
thanks to P2280R4, but it seems to be trivial stuff like calls to
std::array::size or std::addressof. The benefit of P2280 therefore
doesn't seem worth the cost during speculative constexpr folding, so
this patch restricts the paper to only manifestly-constant evaluation.
PR c++/119387
gcc/cp/ChangeLog:
* constexpr.cc (p2280_active_p): New.
(cxx_eval_constant_expression) <case VAR_DECL>: Use it to
restrict P2280 relaxations.
<case PARM_DECL>: Likewise.
Jason Merrill [Tue, 1 Apr 2025 23:22:18 +0000 (19:22 -0400)]
c++/modules: inline loaded at eof
std/format/string.cc and a few other libstdc++ tests were failing with
module std with undefined references to __failed_to_parse_format_spec. This
turned out to be because since r15-8012 we don't end up calling
note_vague_linkage_fn for functions loaded after at_eof is set.
But once import_export_decl decides on COMDAT linkage, we should be able to
just clear DECL_EXTERNAL and let cgraph take it from there.
I initially made this change in import_export_decl, but decided that for GCC
15 it would be safer to limit the change to modules. For GCC 16 I'd like to
do away with DECL_NOT_REALLY_EXTERN entirely, it's been obsolete since
cgraphunit in 2003.
gcc/cp/ChangeLog:
* module.cc (module_state::read_cluster)
(post_load_processing): Clear DECL_EXTERNAL if DECL_COMDAT.
Jason Merrill [Tue, 1 Apr 2025 17:04:05 +0000 (13:04 -0400)]
c++: operator!= rewriting and arg-dep lookup
When considering an op== as a rewrite target, we need to disqualify it if
there is a matching op!= in the same scope. But add_candidates was assuming
that we could use the same set of op!= for all op==, which is wrong if
arg-dep lookup finds op== in multiple namespaces.
This broke 20_util/optional/relops/constrained.cc if the order of the ADL
set changed.
gcc/cp/ChangeLog:
* call.cc (add_candidates): Re-lookup ne_fns if we move into
another namespace.
Looking over the recently-committed change to the musttail attribute
documentation, it appears the comment in the last example was a paste-o,
as it does not agree with either what the similar example in the
-Wmaybe-musttail-local-addr documentation says, or the actual behavior
observed when running the code.
In addition, the entire section on musttail was in need of copy-editing
to put it in the present tense, avoid reference to "the user", etc. I've
attempted to clean it up here.
gcc/ChangeLog
* doc/extend.texi (Statement Attributes): Copy-edit the musttail
attribute documentation and correct the comment in the last
example.
Using specific SSA names in pattern matching in `dg-final' makes tests
"unstable", in that changes in passes prior to the pass whose dump is
analyzed in the particular test may change the numbering of the SSA
variables, causing the test to start failing spuriously.
We thus switch from specific SSA names to the use of a multi-line
regular expression making use of capture groups for matching particular
variables across different statements, ensuring the test will pass
more consistently across different versions of GCC.
PR testsuite/118597
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-fncall-mask.c: Update test directives.
Iain Sandoe [Tue, 25 Mar 2025 15:10:12 +0000 (15:10 +0000)]
libgcobol: Provide fallbacks for C32 strfromf32/64 functions.
strfrom{f,d,l,fN) are all C23 and might not be available in general.
This uses snprintf() to provide fall-backs where the libc does not
yet have support.
libgcobol/ChangeLog:
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check for availability of strfromf32 and
strfromf64.
* libgcobol.cc (strfromf32, strfromf64): New.
Tomasz Kamiński [Thu, 3 Apr 2025 08:23:45 +0000 (10:23 +0200)]
libstdc++: Fix handling of field width for wide strings and characters [PR119593]
This patch corrects handling of UTF-32LE and UTF32-BE in
__unicode::__literal_encoding_is_unicode<_CharT>, so they are
recognized as unicode and functions produces correct result for wchar_t.
Use `__unicode::__field_width` to compute the estimated witdh
of the charcter for unicode wide encoding.
PR libstdc++/119593
libstdc++-v3/ChangeLog:
* include/bits/unicode.h
(__unicode::__literal_encoding_is_unicode<_CharT>):
Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
* include/std/format (__formatter_str::_S_character_width):
Define.
(__formatter_str::_S_character_width): Updated passed char
length.
* testsuite/std/format/functions/format.cc: Test for wchar_t.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Thu, 3 Apr 2025 11:21:56 +0000 (13:21 +0200)]
c++: Fix typo in RAW_DATA_CST build_list_conv subsubconv hanling [PR119563]
The following testcase ICEs (the embed one actually doesn't but
dereferences random uninitialized pointer far after allocated memory)
because of a typo. In the RAW_DATA_CST handling of list conversion
where there are conversions to something other than
initializer_list<{{,un}signed ,}char>, the code now calls
implicit_conversion for all the RAW_DATA_CST elements and stores them
into subsubconvs array.
The next loop (done in a separate loop because subsubconvs[0] is
handled differently) attempts to do the
for (i = 0; i < len; ++i)
{
conversion *sub = subconvs[i];
if (sub->rank > t->rank)
t->rank = sub->rank;
if (sub->user_conv_p)
t->user_conv_p = true;
if (sub->bad_p)
t->bad_p = true;
}
rank/user_conv_p/bad_p merging, but I mistyped the index, the loop
iterates with j iterator and i is subconvs index, so the loop effectively
doesn't do anything interesting except for merging from one of the
subsubconvs element, if lucky within the subsubconvs array, if unlucky
not even from inside of the array.
The following patch fixes that.
2025-04-03 Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek <jakub@redhat.com>
PR c++/119563
* call.cc (build_list_conv): Fix a typo in loop gathering
summary information from subsubconvs.
* g++.dg/cpp0x/pr119563.C: New test.
* g++.dg/cpp/embed-26.C: New test.
Jan Hubicka [Thu, 3 Apr 2025 11:06:07 +0000 (13:06 +0200)]
Fix costs of x86 move instructions at -Os
This patch fixes problem with size costs declaring all moves to have equal size
(which was caught by the sanity check I tried in prologue move cost hook).
Costs are relative to reg-reg move which is two. Coincidentally it is also size
of the encoding, so the costs should represent typical size of move
instruction.
The patch reduces cc1plus text size 26391115->26205707 (0.7%) and similar changes
also happens to other binaries build during bootstrap.
Bootsrapped/regtested x86_64-linux, plan to commit it tomorrow if there
are no complains
There are other targets that define some load/store costs to be 2 that probably
should be fixed too, but they are mostly very old ones and I don't have way of
benchmarking them.
* config/i386/x86-tune-costs.h (ix86_size_cost): Fix sizes of move
instructions
All tests FAIL as is because they lack either or both of the .LFB0 label
and the .cfi_startproc directive:
* The 32-bit pr82142b.c test lacks both, whether as or gas is in use: as
lacks full support for the cfi directives and the .LFB0 label is only
emitted with -fasynchronous-unwind-tables.
* The 64-bit tests pr111673.c and pr82142a.c already work with gas, but
with as the cfi directives are again missing.
In addition, the 32-bit test (pr82142b.c) still FAILs because 32-bit
Solaris/x86 defaults to -mstackrealign.
To fix all this, this patch adds -fasynchronous-unwind-tables
-fdwarf2-cfi-asm to all tests to force the generation of both the .LFB0
label and .cfi_startproc (which is ok since they are compile tests). In
addition, pr82142b.c is compiled with -mno-stackrealign to avoid
platform differences.
Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.
Jakub Jelinek [Thu, 3 Apr 2025 06:46:03 +0000 (08:46 +0200)]
c-family: Regenerate c.opt.urls
On Sun, Mar 30, 2025 at 02:48:43PM +0200, Martin Uecker wrote:
> The warning -Wzero-as-null-pointer-constant is now not only supported
> in C++ but also in C. Change the documentation accordingly.
This change didn't include make regenerate-opt-urls changes, because
moving option documentation to different section can affect the *.urls
files.
Jakub Jelinek [Thu, 3 Apr 2025 06:32:09 +0000 (08:32 +0200)]
fold-const, cobol: Add native_encode_wide_int and use it in COBOL FE [PR119242]
As has been mentioned earlier, various parts of the COBOL FE and the library
aren't endian clean. In the library that means that for now we have to
live with no support for big endian targets, but in the FE that means
that as well as not being able to build cross-compilers from big endian
or pdp endian hosts to little endian targets which are otherwise supported.
The following patch attempts to fix one such spot, where it wants to encode
in target byte ordering wide_int constants into 1, 2, 4, 8 or 16 bytes.
We could wide_int_to_tree and then native_encode_expr, but so that we don't
need to build the constants, the following patch exports from fold-const.cc
a helper for native_encode_int which takes type and const wide_int_ref
reference rather than an expression.
2025-04-03 Jakub Jelinek <jakub@redhat.com>
PR cobol/119242
gcc/
* fold-const.h (native_encode_wide_int): Declare.
* fold-const.cc (native_encode_wide_int): New function.
(native_encode_int): Use it.
gcc/cobol/
* genapi.cc (binary_initial_from_float128): Use
native_encode_wide_int.
The desired vw{add,sub}.wx instructions don't come up on rv32 for the
first two functions, we get v{add,sub}.vx instead.
I suppose this is an oversight, and something about the test is meant
for rv64 only, but the fact that the instruction is spelled out in the
intrinsic name and a different instruction is generated suggests
something may be wrong after all.
[testsuite] [riscv] limit mcpu-xiangshan-nanhu.c to rv64
The testcase makes the -march option conditional on rv64, and #errors
out if the desired CPU properties are not active. This makes the test
fail on rv32. Arrange to skip the test on rv32 instead, moving the
rv64 conditional.
for gcc/testsuite/ChangeLog
* gcc.target/riscv/mcpu-xiangshan-nanhu.c: Skip on non-rv64.
Some of the tests regressed with a fix for the vectorization of
shifts. The riscv cost models need to be adjusted to avoid the
unprofitable optimization. The failure of these tests has been known
since 2024-03-13, without a forthcoming fix, so I suggest we consider
it expected by now. Adjust the tests to reflect that expectation.
[testsuite] [riscv] xfail ssa-dom-cse-2 on riscv64
For the same reasons that affect alpha and other targets,
gcc.dg/tree-ssa/ssa-dom-cse-2.c fails to be optimized to the expected
return statement: the array initializer is vectorized into pairs, and
DOM cannot see through that.
Add riscv*-*-* to the list of affected lp64 platforms. riscv32 is
not affected.
for gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: XFAIL on riscv lp64.
Hongyu Wang [Mon, 31 Mar 2025 08:39:23 +0000 (16:39 +0800)]
APX: Emit nf variant for rotl splitter with mask [PR 119539]
For spiltter after *<rotate_insn><mode>3_mask it now splits the pattern
to *<rotate_insn><mode>3_mask with flag reg clobber, and it doesn't
generate nf variant of rotate. Directly emit nf pattern when
TARGET_APX_NF enabled in these define_insn_and_split.
gcc/ChangeLog:
PR target/119539
* config/i386/i386.md (*<insn><mode>3_mask): Emit NF variant of
rotate when APX_NF enabled, and use force_lowpart_subreg.
(*<insn><mode>3_mask_1): Likewise.
gcc/testsuite/ChangeLog:
PR target/119539
* gcc.target/i386/apx-nf-pr119539.c: New test.
This issue was specifically about a confusing mention of the "second
and third arguments to the memcpy function" when only the second one
is a pointer affected by the attribute, but reading through the entire
discussion I found other things confusing as well; e.g. in some cases
it wasn't clear whether the "arguments" were the arguments to the
attribute or the function, or exactly what a "positional argument"
was. I've tried to rewrite that part to straighten it out, as well as
some light copy-editing throughout.
gcc/ChangeLog
PR c/101440
* doc/extend.texi (Common Function Attributes): Clean up some
confusing language in the description of the "access" attribute.
Doc: Improve wording of -Werror documentation [PR58973]
gcc/ChangeLog
PR driver/58973
* common.opt (Werror, Werror=): Use less awkward wording in
description.
(pedantic-errors): Likewise.
* doc/invoke.texi (Warning Options): Likewise for -Werror and
-Werror= here.
This reverts a change in the upstream D implementation of the compiler,
as it is no longer necessary since another fix for opDispatch got
applied in the same area (merged in r12-6003-gfd43568cc54e17).
Since r15-9062-g70391e3958db79 we perform vector bitmask initialization
via the vec_duplicate expander directly. This triggered a latent bug in
ours where we missed to mask out the single bit which resulted in an
execution FAIL of pr119114.c
The attached patch adds the 1-masking of the broadcast operand.
Bob Dubner [Wed, 2 Apr 2025 16:18:08 +0000 (12:18 -0400)]
cobol: Plug memory leak caused by intermediate_e stack-frame variables. [PR119521]
COBOL variables with attribute intermediate_e are being allocated on
the stack frame, but their data was assigned using malloc(), without
a corresponding call to free(). For numerics, the problem is solved
with a fixed allocation of sixteen bytes for the cblc_field_t::data
member (sixteen is big enough for all data types) and with a fixed
allocation of 8,192 bytes for the alphanumeric type.
In use, the intermediate numeric data types are "shrunk" to the minimum
applicable size. The intermediate alphanumerics, generally used as
destination targets for functions, are trimmed as well.
gcc/cobol
PR cobol/119521
* genapi.cc: (parser_division): Change comment.
(parser_symbol_add): Change intermediate_t handling.
* parse.y: Multiple changes to new_alphanumeric() calls.
* parse_ante.h: Establish named constant for date function
calls. Change declaration of new_alphanumeric() function.
* symbols.cc: (new_temporary_impl): Use named constant
for default size of temporary alphanumerics.
* symbols.h: Establish MAXIMUM_ALPHA_LENGTH constant.
libgcobol
PR cobol/119521
* intrinsic.cc: (__gg__reverse): Trim final result for intermediate_e.
* libgcobol.cc: (__gg__adjust_dest_size): Abort on attempt to increase
the size of a result. (__gg__module_name): Formatting.
__gg__reverse(): Resize only intermediates
This patch addresses a number of issues with the documentation of
- None of the things in this section had @cindex entries [PR114957].
- The document formatting didn't match that of other #pragma
documentation sections.
- The effect of #pragma pack(0) wasn't documented [PR78008].
- There's a long-standing bug [PR60972] reporting that #pragma pack
and the __attribute__(packed) don't get along well. It seems worthwhile
to warn users about that since elsewhere pragmas are cross-referenced
with related or equivalent attributes.
gcc/ChangeLog
PR c/114957
PR c/78008
PR c++/60972
* doc/extend.texi (Structure-Layout Pragmas): Add @cindex
entries and reformat the pragma descriptions to match the markup
used for other pragmas. Document what #pragma pack(0) does.
Add cross-references to similar attributes.
Jakub Jelinek [Wed, 2 Apr 2025 18:02:34 +0000 (20:02 +0200)]
tailc: Deal with trivially useless EH cleanups [PR119491]
The following testcases FAIL, because EH cleanup is performed only before
IPA and then right before musttail pass.
At -O2 etc. (except for -O0/-Og) we handle musttail calls in the tailc
pass though, and we can fail at that point because the calls might appear
to throw internal exceptions which just don't do anything interesting
(perhaps have debug statements or clobber statements in them) before they
continue with resume of the exception (i.e. throw it externally).
As Richi said in the PR (and I agree) that moving passes is risky at this
point, the following patch instead teaches the tail{r,c} and musttail
passes to deal with such extra EDGE_EH edges.
It is fairly simple thing, if we see an EDGE_EH edge from the call we
just look up where it lands and if there are no
non-debug/non-clobber/non-label statements before resx which throws
externally, such edge can be ignored for tail call optimization or
tail recursion. At other spots I just need to avoid using
single_succ/single_succ_edge because the bb might have another edge -
EDGE_EH.
To make this less risky, this is done solely for the musttail calls for now.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119491
* tree-tailcall.cc (single_non_eh_succ_edge): New function.
(independent_of_stmt_p): Use single_non_eh_succ_edge (bb)->dest
instead of single_succ (bb).
(empty_eh_cleanup): New function.
(find_tail_calls): Diagnose throwing of exceptions which do not
propagate only if there are no EDGE_EH successor edges. If there are
and the call is musttail, use empty_eh_cleanup to find if the cleanup
is not empty. If not or the call is not musttail, use different
diagnostics. Set is_noreturn even if there are successor edges. Use
single_non_eh_succ_edge (abb) instead of single_succ_edge (abb). Punt
on internal noreturn calls.
(decrease_profile): Don't assert 0 or 1 successor edges.
(eliminate_tail_call): Use
single_non_eh_succ_edge (gsi_bb (t->call_gsi)) instead of
single_succ_edge (gsi_bb (t->call_gsi)).
(tree_optimize_tail_calls_1): Also look into basic blocks with
single succ edge which is EDGE_EH for noreturn musttail calls.
* g++.dg/opt/musttail3.C: New test.
* g++.dg/opt/musttail4.C: New test.
* g++.dg/opt/musttail5.C: New test.
Jakub Jelinek [Wed, 2 Apr 2025 17:28:20 +0000 (19:28 +0200)]
c: Fix ICEs with -fsanitize=pointer-{subtract,compare} [PR119582]
The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR c/119582
* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.
As noted in PR 118965, the initial interop implementation overlooked
the requirement in the OpenMP spec that at least one of the "target"
and "targetsync" modifiers is required in both the interop construct
init clause and the declare variant append_args clause.
Adding the check was fairly straightforward, but it broke about a
gazillion existing test cases. In particular, things like "init (x, y)"
which were previously accepted (and tested for being accepted) aren't
supposed to be allowed by the spec, much less things like "init (target)"
where target was previously interpreted as a variable name instead of a
modifier. Since one of the effects of the change is that at least one
modifier is always required, I found that deleting all the code that was
trying to detect and handle the no-modifier case allowed for better
diagnostics.
gcc/fortran/ChangeLog
PR middle-end/118965
* openmp.cc (gfc_parser_omp_clause_init_modifiers): Fix some
inconsistent code indentation. Remove code for recognizing
clauses without modifiers. Diagnose prefer_type without a
following paren. Adjust error message for an unrecognized modifier.
Diagnose missing target/targetsync modifier.
(gfc_match_omp_init): Fix more inconsistent code indentation.
Iain Sandoe [Mon, 31 Mar 2025 06:02:54 +0000 (07:02 +0100)]
config, toplevel, Darwin: Pass -B instead of -L to C++ commands.
Darwin from 10.11 needs embedded rpaths to find the correct libraries at
runtime. Recent increases in hardening have made it such that the dynamic
loader will no longer fall back to using an installed libstdc++ when the
(new) linked one is not found. This means we fail configure tests (that
should pass) for runtimes that use C++.
We can resolve this by passing '-B' to the C++ command lines instead of '-L'
(-B implies -L on Darwin, but also causes a corresponding embedded rpath).
ChangeLog:
* configure: Regenerate.
* configure.ac: Use -B instead of -L to specifiy the C++ runtime
paths on Darwin.
libstdc++, testsuite, Darwin: Prune a new linker warning present form XCode 16.
Darwin's linker now warns when duplicate rpaths are presented - which
happens when we emit duplicate '-B' paths. In principle, we should avoid
this in the test-suite, however at present we tend to have duplicates
because different parts of the machinery add them. At some point, it might
be nice to have an "add_option_if_missing" and apply that across the whole
of the test infra. However this is not something for late in stage 4. So
the solution here is to prune the warning - the effect of the duplicate in
the libstdc++ testsuite is not important; it will make the exes very slightly
larger but it won't alter the paths that are presented for loading the
runtimes.
libstdc++-v3/ChangeLog:
* testsuite/lib/prune.exp: Prune ld warning about duplicatei
rpaths.
Richard Biener [Wed, 2 Apr 2025 11:12:58 +0000 (13:12 +0200)]
tree-optimization/119586 - aligned access to unaligned data
The following reverts parts of r15-8047 which assesses alignment
analysis for VMAT_STRIDED_SLP is correct by using aligned accesses
where allowed by it. As the PR shows this analysis is still incorrect,
so revert back to assuming we got it wrong.
PR tree-optimization/119586
* tree-vect-stmts.cc (vectorizable_load): Assume we got
alignment analysis for VMAT_STRIDED_SLP wrong.
(vectorizable_store): Likewise.
Jakub Jelinek [Wed, 2 Apr 2025 10:36:29 +0000 (12:36 +0200)]
doc: Extend musttail attribute docs
On Wed, Apr 02, 2025 at 10:32:20AM +0200, Richard Biener wrote:
> I wonder if we can amend the documentation to suggest to end lifetime
> of variables explicitly by proper scoping?
In the -Wmaybe-musttail-local-addr attribute description I've already
tried to show that in the example, but if you think something like
the following would make it clearer.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
* doc/extend.texi (musttail statement attribute): Hint how
to avoid -Wmaybe-musttail-local-addr warnings.
Jonathan Wakely [Tue, 18 Mar 2025 18:37:01 +0000 (18:37 +0000)]
cobol: Fix incorrect use of std::remove_if
The call to std::remove_if used here doesn't remove any elements, it
just overwrites the "removed" elements with later elements, leaving the
total number of elements unchanged. Use std::list::remove_if to actually
remove those unwanted elements from the list.
gcc/cobol/ChangeLog:
* symfind.cc (finalize_symbol_map2): Use std::list::remove_if
instead of std::remove_if.
Jakub Jelinek [Wed, 2 Apr 2025 08:51:42 +0000 (10:51 +0200)]
tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]
As discussed here and in bugzilla, [[clang::musttail]] attribute in clang
not just strongly asks for tail call or error, but changes behavior.
To quote:
https://clang.llvm.org/docs/AttributeReference.html#musttail
"The lifetimes of all local variables and function parameters end immediately
before the call to the function. This means that it is undefined behaviour
to pass a pointer or reference to a local variable to the called function,
which is not the case without the attribute. Clang will emit a warning in
common cases where this happens."
The GCC behavior was just to error if we can't prove the musttail callee
could not have dereferenced escaped pointers to local vars or parameters
of the caller. That is still the case for variables with non-trivial
destruction (even in clang), like vars with C++ non-trivial destructors or
variables with cleanup attribute.
The following patch changes the behavior to match that of clang, for all of
[[clang::musttail]], [[gnu::musttail]] and __attribute__((musttail)).
clang 20 actually added warning for some cases of it in
https://github.com/llvm/llvm-project/pull/109255
but it is under -Wreturn-stack-address warning.
Now, gcc doesn't have that warning, but -Wreturn-local-addr instead, and
IMHO it is better to have this under new warnings, because this isn't about
returning local address, but about passing it to a musttail call, or maybe
escaping to a musttail call. And perhaps users will appreciate they can
control it separately as well.
The patch introduces 2 new warnings.
-Wmusttail-local-addr
which is turn on by default and warns for the always dumb cases of passing
an address of a local variable or parameter to musttail call's argument.
And then
-Wmaybe-musttail-local-addr
which is only diagnosed if -Wmusttail-local-addr was not diagnosed and
diagnoses at most one (so that we don't emit 100s of warnings for one call
if 100s of vars can escape) case where an address of a local var could have
escaped to the musttail call. This is less severe, the code doesn't have
to be obviously wrong, so the warning is only enabled in -Wextra.
And I've adjusted also the documentation for this change and addition of
new warnings.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR ipa/119376
* common.opt (Wmusttail-local-addr, Wmaybe-musttail-local-addr): New.
* tree-tailcall.cc (suitable_for_tail_call_opt_p): Don't fail for
TREE_ADDRESSABLE PARM_DECLs for musttail calls if diag_musttail.
Emit -Wmusttail-local-addr warnings.
(maybe_error_musttail): Use gimple_location instead of directly
accessing location member.
(find_tail_calls): For musttail calls if diag_musttail, don't fail
if address of local could escape to the call, instead emit
-Wmaybe-musttail-local-addr warnings. Emit
-Wmaybe-musttail-local-addr warnings also for address taken
parameters.
* common.opt.urls: Regenerate.
* doc/extend.texi (musttail statement attribute): Clarify local
variables without non-trivial destruction are considered out of scope
before the tail call instruction.
* doc/invoke.texi (-Wno-musttail-local-addr,
-Wmaybe-musttail-local-addr): Document.
* c-c++-common/musttail8.c: Expect a warning rather than error in one
case.
(f4): Add int * argument.
* c-c++-common/musttail15.c: Don't disallow for C++98.
* c-c++-common/musttail16.c: Likewise.
* c-c++-common/musttail17.c: Likewise.
* c-c++-common/musttail18.c: Likewise.
* c-c++-common/musttail19.c: Likewise. Expect a warning rather than
error in one case.
(f4): Add int * argument.
* c-c++-common/musttail20.c: Don't disallow for C++98.
* c-c++-common/musttail21.c: Likewise.
* c-c++-common/musttail28.c: New test.
* c-c++-common/musttail29.c: New test.
* c-c++-common/musttail30.c: New test.
* c-c++-common/musttail31.c: New test.
* g++.dg/ext/musttail1.C: New test.
* g++.dg/ext/musttail2.C: New test.
* g++.dg/ext/musttail3.C: New test.
bitmap_set_bit checks the original value of the bit to return it to the
caller and then only writes the new value back if it changes.
Most callers of bitmap_set_bit don't need the return value, but with the conditional store
the CPU still has to predict it correctly since gcc doesn't know how to do
that without APX on x86 (even though CMOV could do it with a dummy target).
Really if-conversion should handle this case, but for now we can fix
it.
This simple patch improves runtime by 15% for the test case in the PR.
Which is more than I expected given it only has ~1.44% of the cycles, but I guess
the mispredicts caused some down stream effects.
cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc -quiet
ran 1.15 ± 0.01 times faster than cc1plus -std=gnu++20 -O2 pr119482.cc -quiet
At least with this test case the total number of branches decreases
drastically. Even though the mispredict rate goes up slightly it is
still a big win.
$ perf stat -e branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \
-a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc -quiet -w
Performance counter stats for 'system wide':
41,932,957,091 branches
686,117,623 branch-misses # 1.64% of all branches
43,690.47 MiB uncore_imc/cas_count_read/
12,362.56 MiB uncore_imc/cas_count_write/
Doc: Cross-reference constructor and init_priority attributes [PR118982]
Per the issue, the discussion of these two attributes needed to be
better integrated. I also did some editing for style and readability,
and clarified that almost all targets support this feature (it is
enabled by default unless the back end disables it), not just "some".
Co-Authored_by: Jonathan Wakely <jwakely@redhat.com>
gcc/ChangeLog
PR c++/118982
* doc/extend.texi (Common Function Attributes): For the
constructor/destructory attribute, be more explicit about the
relationship between the constructor attribute and
the C++ init_priority attribute, and add a cross-reference.
Also document that most targets support this.
(C++ Attributes): Similarly for the init_priority attribute.
Doc: Document enum with underlying type extension [PR117689]
This is a C23/C++11 feature that is supported as an extension with
earlier -std= options too, but was never previously documented. It
interacts with the already-documented forward enum definition extension,
so I have merged discussion of the two extensions into the same section.
gcc/ChangeLog
PR c/117689
* doc/extend.texi (Incomplete Enums): Rename to....
(Enum Extensions): This. Document support for specifying the
underlying type of an enum as an extension in all earlier C
and C++ standards. Document that a forward declaration with
underlying type is not an incomplete type, and which dialects
GCC supports that in.
c++/modules: Forbid exposures of TU-local entities in inline variables [PR119551]
An inline variable has vague linkage, and needs to be conditionally
emitted in TUs that reference it. Unfortunately this clashes with
[basic.link] p14.2, which says that we ignore the initialisers of all
variables (including inline ones), since importers will not have access
to the referenced TU-local entities to write the definition.
This patch makes such exposures be ill-formed. One case that continues
to work is if the exposure is part of the dynamic initialiser of an
inline variable; in such cases, the definition has been built as part of
the module interface unit anyway, and importers don't need to write it
out again, so such exposures are "harmless".
PR c++/119551
gcc/cp/ChangeLog:
* module.cc (trees_out::write_var_def): Only ignore non-inline
variable initializers.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-5_a.C: Add cases that should be
ignored.
* g++.dg/modules/internal-5_b.C: Test these new cases, and make
the testcase more robust.
* g++.dg/modules/internal-11.C: New test.
* g++.dg/modules/internal-12_a.C: New test.
* g++.dg/modules/internal-12_b.C: New test.
Jonathan Wakely [Mon, 31 Mar 2025 11:30:44 +0000 (12:30 +0100)]
libstdc++: Fix -Warray-bounds warning in std::vector::resize [PR114945]
This is yet another false positive warning fix. This time the compiler
can't prove that when the vector has sufficient excess capacity to
append new elements, the pointer to the existing storage is not null.
libstdc++-v3/ChangeLog:
PR libstdc++/114945
* include/bits/vector.tcc (vector::_M_default_append): Add
unreachable condition so the compiler knows that _M_finish is
not null.
* testsuite/23_containers/vector/capacity/114945.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tom Tromey [Wed, 21 Aug 2024 17:46:52 +0000 (11:46 -0600)]
Further use of mod_scope in modified_type_die
I am working on some changes to GNAT to emit hierarchical DWARF --
i.e., where entities will have simple names nested in a DW_TAG_module.
While working on this I found a couple of paths in modified_type_die
where "mod_scope" should be used, but is not. I suspect these cases
are only reachable by Ada code, as in both spots (subrange types and
base types), I believe that other languages don't generally have named
types in a non-top-level scope, and in these other situations,
mod_scope will still be correct.
gcc
* dwarf2out.cc (modified_type_die): Use mod_scope for
ranged types, base types, and array types.
Jakub Jelinek [Tue, 1 Apr 2025 14:47:37 +0000 (16:47 +0200)]
tailc: Improve tail recursion handling [PR119493]
This is a partial step towards fixing that PR.
For musttail recursive calls which have non-is_gimple_reg_type typed
parameters, the only case we've handled was if the exact parameter
was passed through (perhaps modified, but still the same PARM_DECL).
That isn't necessary, we can copy the argument to the parameter as well
(just need to watch for the use of the parameter in later arguments,
say musttail recursive call which swaps 2 structure arguments).
The patch attempts to play safe and punts if any of the parameters are
addressable (like we do for all normal tail calls and tail recursions,
except for musttail in the posted unreviewed patch).
With this patch (at least when early inlining isn't done on not yet
optimized body) inlining should see already tail recursion optimized
body and will not have problems with SRA breaking musttail.
This version of the patch limits this for musttail tail recursions,
with intent to enable for all tail recursions in GCC 16.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119493
* tree-tailcall.cc (find_tail_calls): Don't punt on tail recusion
if some arguments don't have is_gimple_reg_type, only punt if they
have non-POD types, or volatile, or addressable or (for now) it is
not a musttail call. Set tailr_arg_needs_copy in those cases too.
(eliminate_tail_call): Copy call arguments to params if they don't
have is_gimple_reg_type, use temporaries if the argument is used
later.
(tree_optimize_tail_calls_1): Skip !is_gimple_reg_type
tailr_arg_needs_copy parameters. Formatting fix.
Jakub Jelinek [Tue, 1 Apr 2025 14:40:55 +0000 (16:40 +0200)]
combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
(const_int 0 [0])))
(set (reg/v:SI 116 [ e ])
(neg:SI (reg:SI 104 [ _7 ])))
]) "pr119291.c":26:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(ne:DI (reg:CCZ 17 flags)
(const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
(expr_list:REG_DEAD (reg:CCZ 17 flags)
(nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
(set (pc)
(pc))
(set (reg/v:SI 116 [ e ])
(const_int 0 [0]))
]) "pr119291.c":28:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).
This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.
As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
(pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
(nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
(nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).
The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.
Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).
On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why
> does the comment talk about memory?
I was worried about making too risky changes this late in stage4
(and especially also for backports). Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).
What we know for sure is:
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).
Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes. This part didn't change in my patch.
I've only changed those
- && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+ && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.
The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3. So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false. Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch). In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.
Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.
To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).
So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.
Kito Cheng [Tue, 1 Apr 2025 01:14:51 +0000 (09:14 +0800)]
RISC-V: Tweak testcase for PIE
Linux toolchain may configured with --enable-default-pie, and that will
cause lots of regression test failures because the function name will
append with @plt suffix (e.g. `call foo` become `call foo@plt`), also
some code generation will different due to the code model like the address
generation for global variable, so we may add -fno-pie to those
testcases to prevent that.
We may consider just drop @plt suffix to prevent that at all, because
it's not difference between w/ and w/o @plt suffix, the linker will pick
the right one to do, however it's late stage of GCC development, so just
tweak the testcase should be the best way to do now.
Changes from v1:
- Add more testcase for PIE (from rvv.exp).
- Tweak the rule for match @plt.
Marek Polacek [Tue, 25 Mar 2025 17:36:24 +0000 (13:36 -0400)]
c++: fix missing lifetime extension [PR119383]
Since r15-8011 cp_build_indirect_ref_1 won't do the *&TARGET_EXPR ->
TARGET_EXPR folding not to change its value category. That fix seems
correct but it made us stop extending the lifetime in this testcase,
causing a wrong-code issue -- extend_ref_init_temps_1 did not see
through the extra *& because it doesn't use a tree walk.
This patch reverts r15-8011 and instead handles the problem in
build_over_call by calling force_lvalue in the is_really_empty_class
case as well as in the general case.
PR c++/119383
gcc/cp/ChangeLog:
* call.cc (build_over_call): Use force_lvalue to ensure op= returns
an lvalue.
* cp-tree.h (force_lvalue): Declare.
* cvt.cc (force_lvalue): New.
* typeck.cc (cp_build_indirect_ref_1): Revert r15-8011.
Jakub Jelinek [Tue, 1 Apr 2025 09:45:16 +0000 (11:45 +0200)]
profile: Another profiling musttail call fix [PR119535]
As the following testcase shows, EDGE_FAKE edges from musttail calls to
EXIT aren't the only edges we should ignore, we need to ignore also
edges created by the splitting of blocks for the EDGE_FAKE creation that
point from the musttail calls to the fallthrough block, which typically does
the return or with PHIs for the return value.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR gcov-profile/119535
* profile.cc (branch_prob): Ignore any edges from bbs ending with
musttail call, rather than only EDGE_FAKE edges from those to EXIT.
Jakub Jelinek [Tue, 1 Apr 2025 09:43:16 +0000 (11:43 +0200)]
tailr: Punt on tail recursions that would break musttail [PR119493]
While working on the previous tailc patch, I've noticed the following
problem.
The testcase below fails, because we decide to tail recursion optimize
the call, but tail recursion (as documented in tree-tailcall.cc) needs to
add some result multiplication and/or addition if any tail recursion uses
accumulator, which is added right before the return.
So, if there are musttail non-recurive calls in the function, successful
tail recursion optimization will mean we'll later error on the musttail
calls. musttail recursive calls are ok, those would be tail recursion
optimized.
So, the following patch punts on all tail recursion optimizations if it
needs accumulators (add and/or mult) if there is at least one non-recursive
musttail call.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119493
* tree-tailcall.cc (tree_optimize_tail_calls_1): Ignore tail recursion
candidates which need accumulators if there is at least one musttail
non-recursive call.
Jakub Jelinek [Tue, 1 Apr 2025 09:40:58 +0000 (11:40 +0200)]
gimple-low: Diagnose assume attr expressions defining labels which are used as unary && operands outside of those [PR119537]
The following testcases ICE on invalid code which defines
labels inside of statement expressions and then uses &&label
from code outside of the statement expressions.
The C++ FE diagnoses that with a warning (not specifically for
assume attribute, genericallly about taking address of a label
outside of a statement expression so computed goto could violate
the requirement that statement expression is not entered from
outside of it through a jump into it), the C FE doesn't diagnose
anything.
Normal direct gotos to such labels are diagnosed by both C and C++.
In the assume attribute case it is actually worse than for
addresses of labels in normal statement expressions, in that case
the labels are still in the current function, so invalid program
can still jump to those (and in case of OpenMP/OpenACC where it
is also invalid and stuff is moved to a separate function, such
movement is done post cfg discovery of FORCED_LABELs and worst
case one can run into cases which fail to assemble, but I haven't
succeeded in creating ICE for that).
For assume at -O0 we'd just throw away the assume expression if
it is not a simple condition and so the label is then not defined
anywhere and we ICE during cfg pass.
The gimplify.cc hunks fix that, as we don't have FORCED_LABELs
discovery done yet, it preserves all assume expressions which contain
used user labels.
With that we ICE during IRA, which is upset about an indirect jump
to a label which doesn't exist.
So, the gimple-low.cc hunks add diagnostics of the problem, it gathers
uids of all the user used labels inside of the assume expressions (usually
none) and if it finds any, walks the IL to find uses of those from outside
of those expressions now outlined into separate magic functions.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/119537
* gimplify.cc (find_used_user_labels): New function.
(gimplify_call_expr): Don't remove complex assume expression at -O0
if it defines any user labels.
* gimple-low.cc: Include diagnostic-core.h.
(assume_labels): New variable.
(diagnose_assume_labels): New function.
(lower_function_body): Call it via walk_gimple_seq if assume_labels
is non-NULL, then BITMAP_FREE assume_labels.
(find_assumption_locals_r): Record in assume_labels uids of user
labels defined in assume attribute expressions.
* c-c++-common/pr119537-1.c: New test.
* c-c++-common/pr119537-2.c: New test.
Thomas Schwinge [Mon, 31 Mar 2025 07:55:14 +0000 (09:55 +0200)]
GCN: Don't emit weak undefined symbols [PR119369]
This resolves all instances of PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'";
for all affected test cases, the execution test status progresses FAIL -> PASS.
This however also causes a small number of (expected) regressions, very similar
to GCC/nvptx:
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++17 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++26 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/abi/pure-virtual1.C -std=c++98 (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/attr-weakref-1.c (test for excess errors)
[-FAIL:-]{+UNRESOLVED:+} gcc.dg/attr-weakref-1.c [-execution test-]{+compilation failed to produce executable+}
This fixes a few hundreds of compilation/linking FAILs (similar to PR69506),
where the GCN/LLVM 'ld' reported:
ld: error: relocation R_AMDGPU_REL32_LO cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
>>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
>>> referenced by cow-stdexcept.cc:259 ([...]/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
>>> cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a
ld: error: relocation R_AMDGPU_REL32_HI cannot be used against symbol '_ZGTtnam'; recompile with -fPIC
>>> defined in [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a(cow-stdexcept.o)
>>> referenced by cow-stdexcept.cc:259 ([...]/source-gcc/libstdc++-v3/src/c++11/cow-stdexcept.cc:259)
>>> cow-stdexcept.o:(_txnal_cow_string_C1_for_exceptions(void*, char const*, void*)) in archive [...]/amdgcn-amdhsa/./libstdc++-v3/src/.libs/libstdc++.a
[...]
..., which is:
$ c++filt _ZGTtnam
transaction clone for operator new[](unsigned long)
..., and similarly for other libitm symbols.
However, the affected test cases, if applicable, then run into execution test
FAILs, due to PR119369
"GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'".
PR target/119369
libstdc++-v3/
* config/cpu/gcn/cpu_defines.h: New.
* configure.host [GCN] (cpu_defines_dir): Point to it.
Richard Biener [Mon, 31 Mar 2025 12:56:25 +0000 (14:56 +0200)]
target/119549 - fixup handling of -mno-sse4 in target attribute
The following fixes ix86_valid_target_attribute_inner_p to properly
handle target("no-sse4") via OPT_mno_sse4 rather than as unset OPT_msse4.
I've added asserts to ix86_handle_option that RejectNegative is honored
for both.
PR target/119549
* common/config/i386/i386-common.cc (ix86_handle_option):
Assert that both OPT_msse4 and OPT_mno_sse4 are never unset.
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Process negated OPT_msse4 as OPT_mno_sse4.
Jakub Jelinek [Tue, 1 Apr 2025 08:05:18 +0000 (10:05 +0200)]
libquadmath: Avoid old-style function definition warnings
I've noticed
../../../libquadmath/printf/gmp-impl.h:104:18: warning: old-style function definition [-Wold-style-definition]
../../../libquadmath/printf/gmp-impl.h:104:18: warning: old-style function definition [-Wold-style-definition]
../../../libquadmath/printf/gmp-impl.h:104:18: warning: old-style function definition [-Wold-style-definition]
../../../libquadmath/strtod/strtod_l.c:456:22: warning: old-style function definition [-Wold-style-definition]
warnings during bootstrap (clearly since the switch to -std=gnu23 by default).
The following patch fixes those in libquadmath, the only other warnings are
in zlib.
Hu, Lin1 [Wed, 26 Mar 2025 08:15:52 +0000 (16:15 +0800)]
i386: Add attr_isa for vaes patterns to sync with attr gpr16. [pr119473]
For vaes patterns with jm constraint and gpr16 attr, it requires "isa"
attr to distinct avx/avx512 alternatives in ix86_memory_address_reg_class.
Also adds missing type and mode attributes for those vaes patterns.
gcc/ChangeLog:
PR target/119473
* config/i386/sse.md
(vaesdec_<mode>): Set attr "isa" as "avx,vaes_avx512vl", "type" as
"sselog1", "mode" as "TI".
(vaesdeclast_<mode>): Ditto.
(vaesenc_<mode>): Ditto.
(vaesenclast_<mode>): Ditto.
gcc/testsuite/ChangeLog:
PR target/119473
* gcc.target/i386/pr119473.c: New test.
Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
Monk Chiang [Tue, 4 Feb 2025 07:29:17 +0000 (15:29 +0800)]
RISC-V: Fix wrong LMUL when only implict zve32f.
According to Section 3.4.2, Vector Register Grouping, in the RISC-V
Vector Specification, the rule for LMUL is LMUL >= SEW/ELEN
Changes since V2:
- Add check on vector-iterators.md
- Add one more testcase to check the VLS use correct mode.
gcc/ChangeLog:
* config/riscv/riscv-v.cc: Add restrict for insert LMUL.
* config/riscv/riscv-vector-builtins-types.def:
Use RVV_REQUIRE_ELEN_64 to check LMUL number.
* config/riscv/riscv-vector-switch.def: Likewise.
* config/riscv/vector-iterators.md: Check TARGET_VECTOR_ELEN_64
rather than "TARGET_MIN_VLEN > 32" for all iterator.
Jonathan Wakely [Fri, 28 Mar 2025 15:41:41 +0000 (15:41 +0000)]
libstdc++: Fix -Warray-bounds warning in std::vector<bool> [PR110498]
In this case, we need to tell the compiler that the current size is not
larger than the new size so that all the existing elements can be copied
to the new storage. This avoids bogus warnings about overflowing the new
storage when the compiler can't tell that that cannot happen.
We might as well also hoist the loads of begin() and end() before the
allocation too. All callers will have loaded at least begin() before
calling _M_reallocate.
libstdc++-v3/ChangeLog:
PR libstdc++/110498
* include/bits/vector.tcc (vector<bool, A>::_M_reallocate):
Hoist loads of begin() and end() before allocation and use them
to state an unreachable condition.
* testsuite/23_containers/vector/bool/capacity/110498.cc: New
test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Fri, 28 Mar 2025 15:41:41 +0000 (15:41 +0000)]
libstdc++: Fix -Wstringop-overread warning in std::vector<bool> [PR114758]
As in r13-4393-gcca06f0d6d76b0 and a few other commits, we can avoid
bogus warnings in std::vector<bool> by hoisting some loads to before the
allocation that calls operator new. This means that the compiler has
enough info to remove the dead branches that trigger bogus warnings.
On trunk this is only needed with -fno-assume-sane-operators-new-delete
but it will help on the branches where that option doesn't exist.
libstdc++-v3/ChangeLog:
PR libstdc++/114758
* include/bits/vector.tcc (vector<bool, A>::_M_fill_insert):
Hoist loads of begin() and end() before allocation.
* testsuite/23_containers/vector/bool/capacity/114758.cc: New
test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Mon, 31 Mar 2025 14:07:12 +0000 (15:07 +0100)]
Libstdc++: Fix bootstrap failure for cross without tm.tm_zone [PR119550]
In r15-8491-g778c28c70f8573 I added a use of the Autoconf macro
AC_STRUCT_TIMEZONE, but that requires a link-test for the global tzname
object if tm.tm_zone isn't supported. That link-test isn't allowed for
cross-compilation, so bootstrap fails if tm.tm_zone isn't supported.
Since libstdc++ only cares about tm.tm_zone and won't use tzname anyway,
we don't need the link-test. Replace AC_STRUCT_TIMEZONE with a custom
macro that only checks for tm.tm_zone. We can improve on the Autoconf
macro by checking it's a suitable type, which isn't actually checked by
AC_STRUCT_TIMEZONE.
libstdc++-v3/ChangeLog:
PR libstdc++/119550
* acinclude.m4 (GLIBCXX_STRUCT_TM_TM_ZONE): New macro.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_STRUCT_TM_TM_ZONE.
* include/bits/chrono_io.h (__formatter_chrono::_M_c): Check
_GLIBCXX_USE_STRUCT_TM_TM_ZONE instead of
_GLIBCXX_HAVE_STRUCT_TM_TM_ZONE.