Tomasz Kamiński [Mon, 28 Apr 2025 06:53:59 +0000 (08:53 +0200)]
libstdc++: Fix mingw build by using _M_span [PR119970]
The r16-142-g01e5ef3e8b9128 chagned return type of _Str_sink::view()
to basic_string_view<_CharT>. The mutable access is provided by _M_span
function, that is now used for mingw path.
Tomasz Kamiński [Thu, 20 Mar 2025 08:02:03 +0000 (09:02 +0100)]
libstdc++: Strip reference and cv-qual in range deduction guides for maps.
This implements part of LWG4223 that adjust the deduction guides for maps types
(map, unordered_map, flat_map and non-unique equivalent) from "range"
(std::from_range, iterator pair), such that referience and cv qualification are
stripped from the element of the pair-like value_type.
* include/bits/ranges_base.h (__detail::__range_key_type):
Replace remove_const_t with remove_cvref_t.
(__detail::__range_mapped_type): Apply remove_cvref_t.
* include/bits/stl_iterator.h: (__detail::__iter_key_t):
Replace remove_const_t with __remove_cvref_t.
(__detail::__iter_val_t): Apply __remove_cvref_t.
* testsuite/23_containers/flat_map/1.cc: New tests.
* testsuite/23_containers/flat_multimap/1.cc: New tests.
* testsuite/23_containers/map/cons/deduction.cc: New tests.
* testsuite/23_containers/map/cons/from_range.cc: New tests.
* testsuite/23_containers/multimap/cons/deduction.cc: New tests.
* testsuite/23_containers/multimap/cons/from_range.cc: New tests.
* testsuite/23_containers/unordered_map/cons/deduction.cc: New tests.
* testsuite/23_containers/unordered_map/cons/from_range.cc: New tests.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
New tests.
* testsuite/23_containers/unordered_multimap/cons/from_range.cc:
New tests.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Tue, 18 Mar 2025 15:10:48 +0000 (16:10 +0100)]
libstdc++: Implement missing allocator-aware constructors for unordered containers.
This patch implements remainder of LWG2713 (after r15-8293-g64f5c854597759)
by adding missing allocator aware version of unordered associative containers
constructors accepting pair of iterators or initializer_list, and corresponding
deduction guides.
libstdc++-v3/ChangeLog:
* include/bits/unordered_map.h (unordered_map):
Define constructors accepting:
(_InputIterator, _InputIterator, const allocator_type&),
(initializer_list<value_type>, const allocator_type&),
(unordered_multimap): Likewise.
* include/debug/unordered_map (unordered_map): Likewise.
(unordered_multimap): Likewise.
* include/bits/unordered_set.h (unordered_set):
Define constructors and deduction guide accepting:
(_InputIterator, _InputIterator, const allocator_type&),
(initializer_list<value_type>, const allocator_type&),
(unordered_multiset): Likewise.
* include/debug/unordered_set (unordered_set): Likewise.
(unordered_multiset): Likewise.
* testsuite/23_containers/unordered_map/cons/66055.cc: New tests.
* testsuite/23_containers/unordered_map/cons/deduction.cc: New tests.
* testsuite/23_containers/unordered_multimap/cons/66055.cc: New tests.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc: New
tests.
* testsuite/23_containers/unordered_multiset/cons/66055.cc: New tests.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc: New
tests.
* testsuite/23_containers/unordered_set/cons/66055.cc: New tests.
* testsuite/23_containers/unordered_set/cons/deduction.cc: New tests.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Mon, 28 Apr 2025 07:22:50 +0000 (09:22 +0200)]
tailc: Improve tail recursion handling [PR119493]
Here is a patch to improve the tail recursion handling also for
non-musttail calls.
2025-04-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/119493
* tree-tailcall.cc (find_tail_calls): Handle non-gimple_reg_type
arguments which aren't just passed through for tail recursions
even for non-musttail calls.
Lewis Hyatt [Tue, 11 Feb 2025 18:45:41 +0000 (13:45 -0500)]
c-family: Improve location for -Wunknown-pragmas in a _Pragma [PR118838]
The warning for -Wunknown-pragmas is issued at the location provided by
libcpp to the def_pragma() callback. This location is
cpp_reader::directive_line, which is a location for the start of the line
only; it is also not a valid location in case the unknown pragma was lexed
from a _Pragma string. These factors make it impossible to suppress
-Wunknown-pragmas via _Pragma("GCC diagnostic...") directives on the same
source line, as in the PR and the test case. Address that by issuing the
warning at a better location returned by cpp_get_diagnostic_override_loc().
libcpp already maintains this location to handle _Pragma-related diagnostics
internally; it was needed also to make a publicly accessible version of it.
gcc/c-family/ChangeLog:
PR c/118838
* c-lex.cc (cb_def_pragma): Call cpp_get_diagnostic_override_loc()
to get a valid location at which to issue -Wunknown-pragmas, in case
it was triggered from a _Pragma.
LIU Hao [Sun, 27 Apr 2025 10:18:34 +0000 (18:18 +0800)]
gcc: For Windows x86-32, always attempt to realign stack regardless of SSE
For Windows x86-32 targets, the Microsoft ABI only guarantees that the stack
is aligned to 4-byte boundaries. GCC knows about the default alignment of the
stack. However, before this commit, it did not realign the stack unless SSE
was also enabled.
When a stricter (larger) alignment is requested, it's always necessary to
realign the stack, as what Solaris does.
Reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111107#c14 Signed-off-by: LIU Hao <lh_mouse@126.com> Signed-off-by: Jonathan Yong <10walls@gmail.com>
gcc/ChangeLog:
PR target/111107
* config/i386/cygming.h (STACK_REALIGN_DEFAULT): Copy from sol2.h.
c++/modules: Ensure DECL_FRIEND_CONTEXT is streamed [PR119939]
An instantiated friend function relies on DECL_FRIEND_CONTEXT being set
to be able to recover the template arguments of the class that
instantiated it, despite not being a template itself. This patch
ensures that this data is streamed even when DECL_CLASS_SCOPE_P is not
true.
PR c++/119939
gcc/cp/ChangeLog:
* module.cc (trees_out::lang_decl_vals): Also stream
lang->u.fn.context when DECL_UNIQUE_FRIEND_P.
(trees_in::lang_decl_vals): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/concept-11_a.H: New test.
* g++.dg/modules/concept-11_b.C: New test.
Pan Li [Thu, 17 Apr 2025 02:27:17 +0000 (10:27 +0800)]
RISC-V: Extract vector stepped for expand_const_vector [NFC]
Consider the expand_const_vector is quit long (about 500 lines)
and complicated, we would like to extract the different case
into different functions. For example, the const vector stepped
will be extracted into expand_const_vector_stepped.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Extract
const vector stepped into separated func.
(expand_const_vector_single_step_npatterns): Add new func
to take care of single step.
(expand_const_vector_interleaved_stepped_npatterns): Add new
func to take care of interleaved step.
(expand_const_vector_stepped): Add new func to take care of
const vector stepped.
Pan Li [Wed, 16 Apr 2025 07:47:21 +0000 (15:47 +0800)]
RISC-V: Extract vector duplicate for expand_const_vector [NFC]
Consider the expand_const_vector is quit long (about 500 lines)
and complicated, we would like to extract the different case
into different functions. For example, the const vector duplicate
will be extracted into expand_const_vector_duplicate, and then
expand_const_vector_duplicate_repeating and
expand_const_vector_duplicate_default for the underlying function.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector_duplicate_repeating):
Add new func to take care of vector duplicate with repeating.
(expand_const_vector_duplicate_default): Add new func to take
care of default const vector duplicate.
(expand_const_vector_duplicate): Add new func to take care
of all const vector duplicate.
(expand_const_vector): Extract const vector duplicate into
separated function.
Pan Li [Wed, 16 Apr 2025 06:43:23 +0000 (14:43 +0800)]
RISC-V: Extract vec_series for expand_const_vector [NFC]
Consider the expand_const_vector is quit long (about 500 lines)
and complicated, we would like to extract the different case
into different functions. For example, the const vec_series
will be extracted into expand_const_vec_series.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vec_series): Add new
func to take care of the const vec_series.
(expand_const_vector): Extract const vec_series into separated
function.
Pan Li [Wed, 16 Apr 2025 03:16:21 +0000 (11:16 +0800)]
RISC-V: Extract vec_duplicate for expand_const_vector [NFC]
Consider the expand_const_vector is quit long (about 500 lines)
and complicated, we would like to extract the different case
into different functions. For example, the const vec_duplicate
will be extracted into expand_const_vec_duplicate.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Extract
const vec_duplicate into separated function.
(expand_const_vec_duplicate): Add new func to take care
of the const vec_duplicate.
Jan Hubicka [Sat, 26 Apr 2025 20:10:19 +0000 (22:10 +0200)]
Fix i386 vectorizer cost of FP scalar MAX_EXPR and MIN_EXPR
I introduced a bug by last minute cleanups unifying the scalar and vector SSE conditional.
This patch fixes it and restores cost of 1 of SSE scalar MIN/MAX
Bootstrapped/regtested x86_64-linux, comitted.
gcc/ChangeLog:
PR target/105275
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Fix cost of FP scalar
MAX_EXPR and MIN_EXPR
This patch marks m32c*-*-* targets obsolete in GCC 16. The target has
not had a maintainer since GCC 9, and fails to compile even the
simplest of functions since GCC 8 (reported in PR83670).
contrib/ChangeLog:
* config-list.mk: Add m32c*-*-* to the list of obsoleted targets.
gcc/ChangeLog:
* config.gcc (LIST): --enable-obsolete for m32c-elf.
This adds the simplification of a ZERO_EXTEND of an AND. This optimization
was already handled in combine via combine_simplify_rtx and the handling
there of compound_operations (ZERO_EXTRACT).
Build and tested for aarch64-linux-gnu.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1) <case ZERO_EXTEND>:
Add simplifcation for and with a constant.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Dimitar Dimitrov [Sat, 11 Jan 2025 16:03:15 +0000 (18:03 +0200)]
testsuite: Skip tests incompatible with generic thunk support
Some backends do not define TARGET_ASM_OUTPUT_MI_THUNK. But the generic
thunk support cannot emit code for calling variadic methods of
multiple-inheritance classes. Example error for pru-unknown-elf:
.../gcc/gcc/testsuite/g++.dg/ipa/pr83549.C:7:24: error: generic thunk code fails for method 'virtual void C::_ZThn4_N1C3fooEz(...)' which uses '...'
Disable the affected tests for all targets which do not define
TARGET_ASM_OUTPUT_MI_THUNK.
Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.
Jonathan Wakely [Fri, 25 Apr 2025 14:49:22 +0000 (15:49 +0100)]
libstdc++: Micro-optimization for std::addressof
Currently std::addressof calls std::__addressof which uses
__builtin_addressof. This leads to me prefering std::__addressof in some
code, to avoid the extra hop. But it's not as though the implementation
of std::__addressof is complicated and reusing it avoids any code
duplication.
So let's just make std::addressof use the built-in directly, and then we
only need to use std::__addressof in C++98 code. (Transitioning existing
uses of std::__addressof to std::addressof isn't included in this
change.)
The front end does fold std::addressof with -ffold-simple-inlines but
this change still seems worthwhile.
libstdc++-v3/ChangeLog:
* include/bits/move.h (addressof): Use __builtin_addressof
directly.
Jonathan Wakely [Fri, 25 Apr 2025 14:57:56 +0000 (15:57 +0100)]
libstdc++: Remove c++26 dg-error lines for -Wdelete-incomplete errors
This fixes:
FAIL: tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc -std=gnu++26 (test for errors, line 283)
FAIL: tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc -std=gnu++26 (test for errors, line 305)
This is another consequence of r16-133-g8acea9ffa82ed8 which prevents
the -Wdelete-incomplete errors that happen after the first error.
libstdc++-v3/ChangeLog:
* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
Remove dg-error directives for additional c++26 errors.
Andrew Pinski [Tue, 22 Apr 2025 22:13:39 +0000 (15:13 -0700)]
match: Move `(cmp (cond @0 @1 @2) @3)` simplification after the bool compare simplifcation
This moves the `(cmp (cond @0 @1 @2) @3)` simplifcation to be after the boolean comparison
simplifcations so that we don't end up simplifing into the same thing for a GIMPLE_COND.
gcc/ChangeLog:
* match.pd: Move `(cmp (cond @0 @1 @2) @3)` simplifcation after
the bool comparison simplifications.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Harald Anlauf [Thu, 24 Apr 2025 19:28:35 +0000 (21:28 +0200)]
Fortran: fix procedure pointer handling with -fcheck=pointer [PR102900]
PR fortran/102900
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_generate_function_code): Use sym->result
when generating fake result decl for functions returning
allocatable or pointer results.
* trans-expr.cc (gfc_conv_procedure_call): When checking the
pointer status of an actual argument passed to a non-allocatable,
non-pointer dummy which is of type CLASS, do not check the
class container of the actual if it is just a procedure pointer.
(gfc_trans_pointer_assignment): Fix treatment of assignment to
NULL of a procedure pointer.
gcc/testsuite/ChangeLog:
* gfortran.dg/proc_ptr_52.f90: Add -fcheck=pointer to options.
* gfortran.dg/proc_ptr_57.f90: New test.
Jason Merrill [Mon, 14 Apr 2025 16:18:06 +0000 (12:18 -0400)]
c++: pruning non-captures in noexcept lambda [PR119764]
The patch for PR87185 fixed the ICE without fixing the underlying problem,
that we were failing to find the declaration of the capture proxy that we
are trying to decide whether to prune. Fixed by looking at the right index
in stmt_list_stack.
Since this changes captures, it changes the ABI of noexcept lambdas; we
haven't worked hard to maintain lambda capture ABI, but it's easy enough to
control here.
PR c++/119764
PR c++/87185
gcc/cp/ChangeLog:
* lambda.cc (insert_capture_proxy): Handle noexcept lambda.
(prune_lambda_captures): Likewise, in ABI v21.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-noexcept1.C: New test.
Jason Merrill [Tue, 22 Apr 2025 20:37:30 +0000 (16:37 -0400)]
c++: add -fabi-version=21
I'm about to add a bugfix that changes the ABI of noexcept lambdas, so first
let's add the new ABI version. And I think it's time to update the
compatibility version; let's bump to GCC 13, before the addition of concepts
mangling.
gcc/ChangeLog:
* common.opt: Add ABI v21.
gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Bump default ABI to 21
and compat ABI to 18.
gcc/testsuite/ChangeLog:
* g++.dg/abi/macro0.C: Update for -fabi-version=21.
Andrew Pinski [Sat, 19 Apr 2025 00:10:12 +0000 (17:10 -0700)]
phiopt: Remove calls.h include [PR119811]
When the patch, https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660807.html was rewroked into r15-3047-g404d947d8ddd3c,
the include for calls.h was still included and missed that it was no longer needed.
Pushed as obvious.
PR tree-optimization/119811
gcc/ChangeLog:
* tree-ssa-phiopt.cc: Remove calls.h include.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Tomasz Kamiński [Wed, 23 Apr 2025 11:17:09 +0000 (13:17 +0200)]
libstdc++: Minimalize temporary allocations when width is specified [PR109162]
When width parameter is specified for formatting range, tuple or escaped
presentation of string, we used to format characters to temporary string,
and write produce sequence padded according to the spec. However, once the
estimated width of formatted representation of input is larger than the value
of spec width, it can be written directly to the output. This limits size of
required allocation, especially for large ranges.
Similarly, if precision (maximum) width is provided for string presentation,
only a prefix of sequence with estimated width not greater than precision, needs
to be buffered.
To realize above, this commit implements a new _Padding_sink specialization.
This sink holds an output iterator, a value of padding width, (optionally)
maximum width and a string buffer inherited from _Str_sink.
Then any incoming characters are treated in one of following ways, depending of
estimated width W of written sequence:
* written to string if W is smaller than padding width and maximum width (if present)
* ignored, if W is greater than maximum width
* written to output iterator, if W is greater than padding width
The padding sink is used instead of _Str_sink in __format::__format_padded,
__formatter_str::_M_format_escaped functions.
Furthermore __formatter_str::_M_format implementation was reworked, to:
* reduce number of instantiations by delegating to _Rg& and const _Rg& overloads,
* non-debug presentation is written to _Out directly or via _Padding_sink
* if maximum width is specified for debug format with non-unicode encoding,
string size is limited to that number.
PR libstdc++/109162
libstdc++-v3/ChangeLog:
* include/bits/formatfwd.h (__simply_formattable_range): Moved from
std/format.
* include/std/format (__formatter_str::_format): Extracted escaped
string handling to separate method...
(__formatter_str::_M_format_escaped): Use __Padding_sink.
(__formatter_str::_M_format): Adjusted implementation.
(__formatter_str::_S_trunc): Extracted as namespace function...
(__format::_truncate): Extracted from __formatter_str::_S_trunc.
(__format::_Seq_sink): Removed forward declarations, made members
protected and non-final.
(_Seq_sink::_M_trim): Define.
(_Seq_sink::_M_span): Renamed from view.
(_Seq_sink::view): Returns string_view instead of span.
(__format::_Str_sink): Moved after _Seq_sink.
(__format::__format_padded): Use _Padding_sink.
* testsuite/std/format/debug.cc: Add timeout and new tests.
* testsuite/std/format/ranges/sequence.cc: Specify unicode as
encoding and new tests.
* testsuite/std/format/ranges/string.cc: Likewise.
* testsuite/std/format/tuple.cc: Likewise.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Andre Vieira [Fri, 25 Apr 2025 13:02:43 +0000 (14:02 +0100)]
modulo-sched: reject loop conditions when not decrementing with one [PR 116479]
In the commit titled 'doloop: Add support for predicated vectorized loops' the
doloop_condition_get function was changed to accept loops with decrements
larger than 1. This patch rejects such loops for modulo-sched.
gcc/ChangeLog:
PR rtl-optimization/116479
* modulo-sched.cc (doloop_register_get): Reject conditions with
decrements that are not 1.
Jakub Jelinek [Fri, 25 Apr 2025 12:42:01 +0000 (14:42 +0200)]
s390: Allow 5+ argument tail-calls in some -m31 -mzarch special cases [PR119873]
Here is a patch to handle the PARALLEL case too.
I think we can just use rtx_equal_p there, because it will always use
SImode in the EXPR_LIST REGs in that case.
2025-04-25 Jakub Jelinek <jakub@redhat.com>
PR target/119873
* config/s390/s390.cc (s390_call_saved_register_used): Don't return
true if default definition of PARM_DECL SSA_NAME of the same register
is passed in call saved register in the PARALLEL case either.
Tomasz Kamiński [Thu, 24 Apr 2025 07:32:24 +0000 (09:32 +0200)]
libstdc++: Constrain formatter for thread::id [PR119918]
This patch add constraint __formatter::__char to _CharT type parameter
of formatter<thread::id, _CharT> specialization, matching the constraint
of formatting of integer/pointers that are used as native handles.
The dependency on <format> header, is changed to <bits/formatfwd.h>.
To achieve that, formatting of pointers is extracted from void const*
specialization to internal __formatter_ptr<_CharT>, that can be forward
declared.
Finally, the handle representation is now printed directly to __fc.out(),
by the formatter for handle type. To support this, internal formatters
can now be constructed from _Spec object as alternative to invoking parse
method.
PR libstdc++/119918
libstdc++-v3/ChangeLog:
* include/bits/formatfwd.h (__format::_Align): Moved from std/format.
(std::__throw_format_error, __format::__formatter_str)
(__format::__formatter_ptr): Declare.
* include/std/format (__format::_Align): Moved to bits/formatfwd.h.
(__formatter_int::__formatter_int): Define.
(__format::__formatter_ptr): Extracted from formatter for const void*.
(std::formatter<const void*, _CharT>, formatter<void*, _CharT>)
(std::formatter<nullptr_t, _CharT>): Delegate to __formatter_ptr<_CharT>.
* include/std/thread (std::formatter<thread::id, _CharT>): Constrain
_CharT template parameter.
(formatter<thread::id, _CharT>::parse): Specify default aligment, and
qualify __throw_format_error to disable ADL.
(formatter<thread::id, _CharT>::format): Use formatters to write directly
to output.
* testsuite/30_threads/thread/id/output.cc: Tests for formatting thread::id
representing not-a-thread with padding and formattable concept.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Tue, 22 Apr 2025 07:56:42 +0000 (09:56 +0200)]
libstdc++: Define __cpp_lib_format_ranges in format header [PR109162]
As P2286R8 and P2585R1 as now fully implemented, we now define
__cpp_lib_format_ranges feature test macro with __cpp_lib_format_ranges.
This macro is provided only in <format>.
Uses of internal __glibcxx_format_ranges are also updated.
PR libstdc++/109162
libstdc++-v3/ChangeLog:
* include/bits/version.def (format_ranges): Remove no_stdname and
update value.
* include/bits/version.h: Regenerate.
* src/c++23/std.cc.in: Replace __glibcxx_format_ranges with
__cpp_lib_format_ranges.
* testsuite/std/format/formatter/lwg3944.cc: Likewise.
* testsuite/std/format/parse_ctx.cc: Likewise.
* testsuite/std/format/string.cc: Likewise.
* testsuite/std/format/ranges/feature_test.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Fri, 18 Apr 2025 12:56:39 +0000 (14:56 +0200)]
libstdc++: Implement formatters for queue, priority_queue and stack [PR109162]
This patch implements formatter specializations for standard container adaptors
(queue, priority_queue and stack) from P2286R8.
To be able to access the protected `c` member, the adaptors befriend
corresponding formatter specializations. Note that such specialization
may be disable if the container is formattable, in such case
specializations are unharmful.
As in the case of previous commits, the signatures of the user-facing parse
and format methods of the provided formatters deviate from the standard by
constraining types of parameters:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies all above as unconstrained types. In particular
_CharT constrain, allow us to befriend all allowed specializations.
Furthermore the standard specifies these formatters as delegating to
formatter<ranges::ref_view<const? _Container>, charT>, which in turn
delegates to range_formatter. This patch avoids one level of indirection,
and dependency of ranges::ref_view. This is technically observable if
user specializes formatter<std::ref_view<PD>> where PD is program defined
container, but I do not think this is the case worth extra indirection.
This patch also moves the formattable and it's dependencies to the formatfwd.h,
so it can be used in adapters formatters, without including format header.
The definition of _Iter_for is changed from alias to denoting
back_insert_iterator<basic_string<_CharT>>, to struct with type nested typedef
that points to same type, that is forward declared.
PR libstdc++/109162
libstdc++-v3/ChangeLog:
* include/bits/formatfwd.h (__format::__parsable_with)
(__format::__formattable_with, __format::__formattable_impl)
(__format::__has_debug_format, __format::__const_formattable_range)
(__format::__maybe_const_range, __format::__maybe_const)
(std::formattable): Moved from std/format.
(__format::Iter_for, std::range_formatter): Forward declare.
* include/bits/stl_queue.h (std::formatter): Forward declare.
(std::queue, std::priority_queue): Befriend formatter specializations.
* include/bits/stl_stack.h (std::formatter): Forward declare.
(std::stack): Befriend formatter specializations.
* include/std/format (__format::_Iter_for): Define as struct with
(__format::__parsable_with, __format::__formattable_with)
(__format::__formattable_impl, __format::__has_debug_format)
(_format::__const_formattable_range, __format::__maybe_const_range)
(__format::__maybe_const, std::formattable): Moved to bits/formatfwd.h.
(std::range_formatter): Remove default argument specified in declaration
in bits/formatfwd.h.
* include/std/queue: Include bits/version.h before bits/stl_queue.h.
(formatter<queue<_Tp, _Container, _Compare>, _CharT>)
(formatter<priority_queue<_Tp, _Container, _Compare>, _CharT>): Define.
* include/std/stack: Include bits/version.h before bits/stl_stack.h
(formatter<stack<_Tp, _Container, _Compare>, _CharT>): Define.
* testsuite/std/format/ranges/adaptors.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jason Merrill [Fri, 18 Apr 2025 22:00:34 +0000 (18:00 -0400)]
c++: bad pending_template recursion
limit_bad_template_recursion currently avoids immediate instantiation of
templates from uses in an already ill-formed instantiation, but we still can
get unnecessary recursive instantiation in pending_templates if the
instantiation was queued before the error.
Initially this regressed several libstdc++ tests which seemed to rely on a
static_assert in a function called from another that is separately ill-formed.
For instance, in the 48101_neg.cc tests, we first got an error in find(), then
later instantiate _S_key() (called from find) and got the static_assert error
from there. r16-131-g876d1a22dfaf87 and r16-132-g901900bc37566c changed
the library code (and tests) to make the expected static_assert errors
happen earlier.
gcc/cp/ChangeLog:
* cp-tree.h (struct tinst_level): Add had_errors bit.
* pt.cc (push_tinst_level_loc): Clear it.
(pop_tinst_level): Set it.
(reopen_tinst_level): Check it.
(instantiate_pending_templates): Call limit_bad_template_recursion.
Jonathan Wakely [Thu, 24 Apr 2025 20:55:16 +0000 (21:55 +0100)]
libstdc++: Improve diagnostics for std::packaged_task invocable checks
Moving the static_assert that checks is_invocable_r_v into _Task_state
means it is checked when we instantiate that class template.
Replacing the __create_task_state function with a static member function
_Task_state::_S_create ensures we instantiate _Task_state and trigger
the static_assert immediately, not deep inside the implementation of
std::allocate_shared. This results in shorter diagnostics that don't
show deeply-nested template instantiations before the static_assert
failure.
Placing the static_assert at class scope also helps us to fail earlier
than waiting until when the _Task_state::_M_run virtual function is
instantiated. That also makes the diagnostics shorter and easier to read
(although for C++11 and C++14 modes the class-scope static_assert
doesn't check is_invocable_r, so dangling references aren't detected
until _M_run is instantiated).
libstdc++-v3/ChangeLog:
* include/std/future (__future_base::_Task_state): Check
invocable requirement here.
(__future_base::_Task_state::_S_create): New static member
function.
(__future_base::_Task_state::_M_reset): Use _S_create.
(__create_task_state): Remove.
(packaged_task): Use _Task_state::_S_create instead of
__create_task_state.
* testsuite/30_threads/packaged_task/cons/dangling_ref.cc:
Adjust dg-error patterns.
* testsuite/30_threads/packaged_task/cons/lwg4154_neg.cc:
Likewise.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Thu, 24 Apr 2025 13:58:58 +0000 (14:58 +0100)]
libstdc++: Add _M_key_compare helper to associative containers
In r10-452-ge625ccc21a91f3 I noted that we don't have an accessor for
invoking _M_impl._M_key_compare in the associative containers. That
meant that the static assertions to check for valid comparison functions
were squirrelled away in _Rb_tree::_S_key instead. As Jason noted in
https://gcc.gnu.org/pipermail/gcc-patches/2025-April/681436.html this
means that the static assertions fail later than we'd like.
This change adds a new _Rb_tree::_M_key_compare member function which
invokes the _M_impl._M_key_compare function object, and then moves the
static_assert from _S_key into _M_key_compare. Now if the static_assert
fails, that's the first error we get, before the "no match for call" and
and "invalid conversion" errors.
Because the new function is const-qualified, we now treat LWG 2542 as a
DR for older standards, requiring the comparison function to be const
invocable. Previously we only enforced the LWG 2542 rule for C++17 and
later.
I did consider deprecating support for comparisons which aren't const
invocable, something like this:
// Before LWG 2542 it wasn't strictly necessary for _Compare to be
// const invocable, if you only used non-const container members.
// Define a non-const overload for pre-C++17, deprecated for C++11/14.
#if __cplusplus < 201103L
bool
_M_key_compare(const _Key& __k1, const _Key& __k2)
{ return _M_impl._M_key_compare(__k1, __k2); }
#elif __cplusplus < 201703L
template<typename _Key1, typename _Key2>
[[__deprecated__("support for comparison functions that are not "
"const invocable is deprecated")]]
__enable_if_t<
__and_<__is_invocable<_Compare&, const _Key1&, const _Key2&>,
__not_<__is_invocable<const _Compare&, const _Key1&, const _Key2&>>>::value,
bool>
_M_key_compare(const _Key1& __k1, const _Key2& __k2)
{
static_assert(
__is_invocable<_Compare&, const _Key&, const _Key&>::value,
"comparison object must be invocable with two arguments of key type"
);
return _M_impl._M_key_compare(__k1, __k2);
}
#endif
But I decided that this isn't necessary, because we've been enforcing
the C++17 rule since GCC 8.4 and 9.2, and C++17 has been the default
since GCC 11.1. Users have had plenty of time to fix their invalid
comparison functions.
libstdc++-v3/ChangeLog:
* include/bits/stl_tree.h (_Rb_tree::_M_key_compare): New member
function to invoke comparison function.
(_Rb_tree): Use new member function instead of accessing the
comparison function directly.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Andrew Pinski [Mon, 21 Apr 2025 22:32:26 +0000 (22:32 +0000)]
GCN: Properly switch sections in 'gcn_hsa_declare_function_name' [PR119737]
There are GCN/C++ target as well as offloading codes, where the hard-coded
section names in 'gcn_hsa_declare_function_name' do not fit, and assembly thus
fails:
LLVM ERROR: Size expression must be absolute.
This commit progresses GCN target:
[-FAIL: g++.dg/init/call1.C -std=gnu++17 (internal compiler error: Aborted signal terminated program as)-]
[-FAIL:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++17 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++17 [-compilation failed to produce executable-]{+execution test+}
[-FAIL: g++.dg/init/call1.C -std=gnu++26 (internal compiler error: Aborted signal terminated program as)-]
[-FAIL:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++26 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} g++.dg/init/call1.C -std=gnu++26 [-compilation failed to produce executable-]{+execution test+}
UNSUPPORTED: g++.dg/init/call1.C -std=gnu++98: exception handling not supported
..., and GCN offloading:
[-XFAIL: libgomp.c++/target-exceptions-throw-1.C (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.c++/target-exceptions-throw-1.C PR119737 at line 7 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.c++/target-exceptions-throw-1.C (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.c++/target-exceptions-throw-1.C [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.c++/target-exceptions-throw-1.C output pattern test+}
[-XFAIL: libgomp.c++/target-exceptions-throw-2.C (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.c++/target-exceptions-throw-2.C PR119737 at line 7 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.c++/target-exceptions-throw-2.C (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.c++/target-exceptions-throw-2.C [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.c++/target-exceptions-throw-2.C output pattern test+}
[-XFAIL: libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 PR119737 at line 7 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.oacc-c++/exceptions-throw-1.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 output pattern test+}
[-XFAIL: libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (internal compiler error: Aborted signal terminated program as)-]
[-XFAIL: libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 PR119737 at line 9 (test for bogus messages, line )-]
[-XFAIL:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 [-compilation failed to produce executable-]{+execution test+}
{+PASS: libgomp.oacc-c++/exceptions-throw-2.C -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O2 output pattern test+}
Thomas Schwinge [Tue, 22 Apr 2025 11:41:22 +0000 (13:41 +0200)]
Adjust 'libgomp.c++/target-exceptions-pr118794-1.C' for 'targetm.arm_eabi_unwinder' [PR118794]
Fix-up for commit aa3e72f943032e5f074b2bd2fd06d130dda8760b
"Add test cases for exception handling constructs in dead code for GCN, nvptx target and OpenMP 'target' offloading [PR118794]":
we need to adjust for configurations with 'targetm.arm_eabi_unwinder', as per:
..., which for ARM is conditional to '#if ARM_UNWIND_INFO' (defined in
'gcc/config/arm/bpabi.h', used for various GCC configurations), and for
C6x unconditional.
Jakub Jelinek [Fri, 25 Apr 2025 08:23:15 +0000 (10:23 +0200)]
Adjust gcc_release for id href web transformations
We now have some script which transforms e.g.
<h2 id="15.1">GCC 15.1</h2>
line in gcc-15/changes.html to
<h2 id="15.1"><a href="#15.1">GCC 15.1</a></h2>
This unfortunately breaks the gcc_release script, which looks for
GCC 15.1 appearing in gennews after optional blanks from the start of
the line in the NEWS file, which is no longer the case, there is
[129]GCC 15.1
or something like that with an URL later on
129. https://gcc.gnu.org/gcc-15/changes.html#15.1
The following patch handles this.
2025-04-25 Jakub Jelinek <jakub@redhat.com>
* gcc_release: Allow optional \[[0-9]+\] before GCC major.minor
in the NEWS file.
[PATCH] RISC-V: Imply C from Zca whenever possible [PR119122]
GCC must imply C extension from Zca extension when it's
possible. It's necessary for achieving compatibility
between different march strings which in fact may be
the same.
E.g., if rv32ic multilib configuration is presented in
GCC, then GCC will not choose this configuration for
linking if -march=rv32i_zca is passed.
Here is a more practical example. From RISC-V
Instruction Set Manual:
Therefore common ISA strings can be updated as follows
to include the relevant Zc extensions, for example:
- RV32IMC becomes RV32IM_Zce
- RV32IMCF becomes RV32IMF_Zce
With current implication rules this will not work well
if rv32imc configuration is presented and a user
passes -march=rv32im_zce. This is how we can check
this with a simple empty test.c source file:
Jakub Jelinek [Thu, 24 Apr 2025 21:44:28 +0000 (23:44 +0200)]
s390: Allow 5+ argument tail-calls in some special cases [PR119873]
protobuf (and therefore firefox too) currently doesn't build on s390*-linux.
The problem is that it uses [[clang::musttail]] attribute heavily, and in
llvm (IMHO llvm bug) [[clang::musttail]] calls with 5+ arguments on
s390*-linux are silently accepted and result in a normal non-tail call.
In GCC we just reject those because the target hook refuses to tail call it
(IMHO the right behavior).
Now, the reason why that happens is as s390_function_ok_for_sibcall attempts
to explain, the 5th argument (assuming normal <= wordsize integer or pointer
arguments, nothing that needs 2+ registers) is passed in %r6 which is not
call clobbered, so we can't do tail call when we'd have to change content
of that register and then caller would assume %r6 content didn't change and
use it again.
In the protobuf case though, the 5th argument is always passed through
from the caller to the musttail callee unmodified, so one can actually
emit just jg tail_called_function or perhaps tweak some registers but
keep %r6 untouched, and in that case I think it is just fine to tail call
it (at least unless the stack slots used for 6+ argument can't be modified
by the callee in the ABI and nothing checks for that).
So, the following patch checks for this special case, where the argument
which uses %r6 is passed in a single register and it is passed default
definition of SSA_NAME of a PARM_DECL with the same DECL_INCOMING_RTL.
It won't really work at -O0 but should work for -O1 and above, at least when
one doesn't really try to modify the parameter conditionally and hope it will
be optimized away in the end.
2025-04-24 Jakub Jelinek <jakub@redhat.com>
Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
PR target/119873
* config/s390/s390.cc (s390_call_saved_register_used): Don't return
true if default definition of PARM_DECL SSA_NAME of the same register
is passed in call saved register.
(s390_function_ok_for_sibcall): Adjust comment.
* gcc.target/s390/pr119873-1.c: New test.
* gcc.target/s390/pr119873-2.c: New test.
* gcc.target/s390/pr119873-3.c: New test.
* gcc.target/s390/pr119873-4.c: New test.
Gaius Mulley [Thu, 24 Apr 2025 21:09:19 +0000 (22:09 +0100)]
PR modula2/115276: libgm2 wraptime.cc field access all return -1.
This patch provides autoconf tests for each field used in wraptime.cc
referencing struct tm and struct timeval.
libgm2/ChangeLog:
PR modula2/115276
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (AC_STRUCT_TIMEZONE): Add.
(AC_CHECK_MEMBER): Test for struct tm.tm_year.
(AC_CHECK_MEMBER): Test for struct tm.tm_mon.
(AC_CHECK_MEMBER): Test for struct tm.tm_mday.
(AC_CHECK_MEMBER): Test for struct tm.tm_hour.
(AC_CHECK_MEMBER): Test for struct tm.tm_min.
(AC_CHECK_MEMBER): Test for struct tm.tm_sec.
(AC_CHECK_MEMBER): Test for struct tm.tm_year.
(AC_CHECK_MEMBER): Test for struct tm.tm_yday.
(AC_CHECK_MEMBER): Test for struct tm.tm_wday.
(AC_CHECK_MEMBER): Test for struct tm.tm_isdst.
(AC_CHECK_MEMBER): Test for struct timeval.tv_sec.
(AC_CHECK_MEMBER): Test for struct timeval.tv_sec.
(AC_CHECK_MEMBER): Test for struct timeval.tv_usec.
* libm2iso/wraptime.cc (InitTimeval): Guard against lack
struct timeval and malloc.
(InitTimezone): Guard against lack of struct tm.tm_zone
and malloc.
(KillTimezone): Ditto.
(InitTimeval): Guard against lack of struct timeval
and malloc.
(KillTimeval): Guard against lack of malloc.
(settimeofday): Guard against lack of struct tm.tm_zone.
(GetFractions): Guard against lack of struct timeval.
(localtime_r): Ditto.
(GetYear): Guard against lack of struct tm.
(GetMonth): Ditto.
(GetDay): Ditto.
(GetHour): Ditto.
(GetMinute): Ditto.
(GetSecond): Ditto.
(GetSummerTime): Ditto.
(GetDST): Guards against lack of struct timezone.
(SetTimezone): Ditto.
(SetTimeval): Guard against lack of struct tm.
Robert Dubner [Thu, 24 Apr 2025 20:26:58 +0000 (16:26 -0400)]
cobol: Repair some exception processing logic.
This patch changes the exception processing logic for the calculation of
reference modifications and table subscripts to be more in accordance with
ISO specifications.
It also adjusts the processing of RETURN-CODE when calling routines that
have no CALL ... RETURNING phrase.
libgomp/testsuite: Fix hip_header_nvidia check, add workaround to test
This is all about using the AMD's HIP header files with
__HIP_PLATFORM_NVIDIA__ defined, i.e. HIP with Nvidia/CUDA; in that case,
HIP is a thin layer on top of CUDA.
First, the check_effective_target_gomp_hip_header_nvidia check failed;
to fix it, -Wno-deprecated-declarations was added - and likewise to the
two affected testcases that actually used the HIP headers on Nvidia.
Doing so, the HIP tested was successful but the HIP-BLAS one showed two
issues:
* One seems to be related to include search paths as the HIP header uses
#include "library_types.h" to include that CUDA header. Seemingly, it
tried to included (again) the HIP header hip/library_types.h, not the
CUDA one. I guess, some tweaking of -isystem vs. -I could have
prevented this, but the simpler workaround was to just explicitly
include the CUDA one before the HIP header files.
* Once done, everything compiled but linking failed as the association
between three HIP-BLAS functions and their CUDA-BLAS ones did not
work. Solution: Just add three #define for mapping them.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp
(check_effective_target_gomp_hip_header_nvidia): Compile with
"-Wno-deprecated-declarations".
* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hipblas.h: Add workarounds
when using the HIP headers with __HIP_PLATFORM_NVIDIA__.
François Dumont [Thu, 10 Apr 2025 18:58:11 +0000 (20:58 +0200)]
libstdc++: Add std::deque<>::shrink_to_fit test
The existing test is currently testing std::vector. Adapt it for std::deque.
libstdc++-v3/ChangeLog:
* testsuite/util/replacement_memory_operators.h: Adapt for -fno-exceptions
context.
* testsuite/23_containers/deque/capacity/shrink_to_fit.cc: Adapt test
to check std::deque shrink_to_fit method.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com>
aarch64: Fix CFA offsets in non-initial stack probes [PR119610]
PR119610 is about incorrect CFI output for a stack probe when that
probe is not the initial allocation. The main aarch64 stack probe
function, aarch64_allocate_and_probe_stack_space, implicitly assumed
that the incoming stack pointer pointed to the top of the frame,
and thus held the CFA.
aarch64_save_callee_saves and aarch64_restore_callee_saves use a
parameter called bytes_below_sp to track how far the stack pointer
is above the base of the static frame. This patch does the same
thing for aarch64_allocate_and_probe_stack_space.
Also, I noticed that the SVE path was attaching the first CFA note
to the wrong instruction: it was attaching the note to the calculation
of the stack size, rather than to the r11<-sp copy.
gcc/
PR target/119610
* config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space):
Add a bytes_below_sp parameter and use it to calculate the CFA
offsets. Attach the first SVE CFA note to the move into the
associated temporary register.
(aarch64_allocate_and_probe_stack_space): Update calls accordingly.
Start out with bytes_per_sp set to the frame size and decrement
it after each allocation.
gcc/testsuite/
PR target/119610
* g++.dg/torture/pr119610.C: New test.
* g++.target/aarch64/sve/pr119610-sve.C: Likewise.
Jakub Jelinek [Thu, 24 Apr 2025 13:29:50 +0000 (15:29 +0200)]
c: Allow $@` in GNU23/GNU2Y raw string delimiters [PR110343]
Aaron mentioned in the PR that late in C23 N3124 was adopted and
$@` are now part of basic character set. The paper has been implemented
in GCC from what I can see, but we should allow for GNU23/2Y $@` in
raw string delimiters as well, like they are allowed for C++26, because
the delimiters can contain anything from basic character set but space,
()\, tab, form-feed, newline and backspace.
2025-04-24 Jakub Jelinek <jakub@redhat.com>
PR c++/110343
* lex.cc (lex_raw_string): For C allow $@` in raw string delimiters
if CPP_OPTION (pfile, low_ucns) i.e. for C23 and later.
Add checks for nowait/depend and for checks that the returned
CUDA, CUDA_DRIVER and HIP interop objects actually work.
While the CUDA/CUDA_DRIVER ones are only for Nvidia GPUs, HIP
works on both AMD and Nvidia GPUs; on Nvidia GPUs, it is a
very thin wrapper around CUDA.
For Fortran, only a HIP test has been added - using hipfort.
While libgomp.c-c++-common/interop-2.c always works - even without
GPU - and checks for depend / nowait, all others require that
runtime libraries are found at link (and execution) time:
For Nvidia GPUs, libcuda + libcudart or libcublas,
For AMD GPUs, libamdhip64 or libhipblas.
The header files and hipfort modules do not need to be present as a
fallback has been implemented, but if they are, they get used.
Due to the combinations, the basic 1x C/C++, 4x C and 1x Fortran tests
yield 1x C/C++, 14x C and 4 Fortran run-test files.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (check_effective_target_openacc_cublas,
check_effective_target_openacc_cudart): Update description as
the check requires more.
(check_effective_target_openacc_libcuda,
check_effective_target_openacc_libcublas,
check_effective_target_openacc_libcudart,
check_effective_target_gomp_hip_header_amd,
check_effective_target_gomp_hip_header_nvidia,
check_effective_target_gomp_hipfort_module,
check_effective_target_gomp_libamdhip64,
check_effective_target_gomp_libhipblas): New.
* testsuite/libgomp.c-c++-common/interop-2.c: New test.
* testsuite/libgomp.c/interop-cublas-full.c: New test.
* testsuite/libgomp.c/interop-cublas-libonly.c: New test.
* testsuite/libgomp.c/interop-cuda-full.c: New test.
* testsuite/libgomp.c/interop-cuda-libonly.c: New test.
* testsuite/libgomp.c/interop-hip-amd-full.c: New test.
* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-full.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hip.h: New test.
* testsuite/libgomp.c/interop-hipblas-amd-full.c: New test.
* testsuite/libgomp.c/interop-hipblas-amd-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-no-headers.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hipblas.h: New test.
* testsuite/libgomp.fortran/interop-hip-amd-full.F90: New test.
* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: New test.
* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: New test.
* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: New test.
* testsuite/libgomp.fortran/interop-hip.h: New test.
opts.cc Simplify handling of explicit -flto-partition= and -fipa-reorder-for-locality
The handling of an explicit -flto-partition= and -fipa-reorder-for-locality
should be simpler. No need to have a new default option. We can use opts_set
to check if -flto-partition is explicitly set and use that information in the
error handling.
Remove -flto-partition=default and update accordingly.
Bootstrapped and tested on aarch64-none-linux-gnu.
Gaius Mulley [Thu, 24 Apr 2025 10:15:18 +0000 (11:15 +0100)]
PR modula2/119915: Sprintf1 repeats the entire format string if it starts with a directive
This bugfix is for FormatStrings to ensure that in the case of %x, %u the
procedure function PerformFormatString uses Copy rather than Slice to
avoid the case on an upper bound of zero in Slice. Oddly the %d case
had the correct code.
gcc/m2/ChangeLog:
PR modula2/119915
* gm2-libs/FormatStrings.mod (PerformFormatString): Handle
the %u and %x format specifiers in a similar way to the %d
specifier. Avoid using Slice and use Copy instead.
gcc/testsuite/ChangeLog:
PR modula2/119915
* gm2/pimlib/run/pass/format2.mod: New test.
/* size: 80, cachelines: 2, members: 7 */
/* sum members: 76 */
/* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 22 bits */
/* last cacheline: 16 bytes */
};
struct dw_attr_struct {
enum dwarf_attribute dw_attr; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct dw_val_node dw_attr_val; /* 8 32 */
/* size: 40, cachelines: 1, members: 2 */
/* sum members: 36, holes: 1, sum holes: 4 */
/* last cacheline: 40 bytes */
};
The following patch is an (not very clean admittedly) attempt to decrease
size of dw_loc_descr_node from 80 bytes to 72 and (more importantly)
dw_attr_struct from 40 bytes to 32 by moving the dw_attr member from
dw_attr_struct into dw_attr_val's padding and similarly move
dw_loc_opc/dtprel/frame_offset_rel members into dw_loc_oprnd1 padding
and dw_loc_addr into dw_loc_oprnd2 padding.
All we need to ensure is that nothing tries to copy whole dw_val_node
structs unless it is copied as part of whole dw_loc_descr_node or
dw_attr_struct copy.
To verify that wasn't the case, I've temporarily added a deleted copy ctor
to dw_val_node and then looked at all the errors/warnings caused by that,
and those were just from memcpy/memmove or structure assignments of whole
dw_loc_descr_node/dw_attr_struct.
2025-04-24 Jakub Jelinek <jakub@redhat.com>
PR debug/119711
* dwarf2out.h (struct dw_val_node): Add u member.
(struct dw_loc_descr_node): Remove dw_loc_opc, dtprel,
frame_offset_rel and dw_loc_addr members.
(dw_loc_opc, dw_loc_dtprel, dw_loc_frame_offset_rel, dw_loc_addr):
Define.
(struct dw_attr_struct): Remove dw_attr member.
(dw_attr): Define.
* dwarf2out.cc (loc_descr_equal_p_1): Use dw_loc_dtprel instead of
dtprel.
(output_loc_operands, new_addr_loc_descr, loc_checksum,
loc_checksum_ordered): Likewise.
(resolve_args_picking_1): Use dw_loc_frame_offset_rel instead of
frame_offset_rel.
(loc_list_from_tree_1): Likewise.
(resolve_addr_in_expr): Use dw_loc_dtprel instead of dtprel.
(copy_deref_exprloc): Copy val_class, val_entry and v members
instead of whole dw_loc_oprnd1 and dw_loc_oprnd2.
(optimize_string_length): Copy val_class, val_entry and v members
instead of whole dw_attr_val.
(hash_loc_operands): Use dw_loc_dtprel instead of dtprel.
(compare_loc_operands, compare_locs): Likewise.
Gaius Mulley [Thu, 24 Apr 2025 01:39:36 +0000 (02:39 +0100)]
PR modula2/119914 No error message generated when passing a Ztype to an unbounded array
This patch detects constants ZType, RType, CType being passed to unbounded
arrays and generates an error message highlighting the formal and
actual parameters in error.
gcc/m2/ChangeLog:
PR modula2/119914
* gm2-compiler/M2Check.mod (checkConstMeta): Add check for
Ztype, Rtype and Ctype and unbounded arrays.
(IsZRCType): New procedure function.
(isZRC): Add comment.
* gm2-compiler/M2Quads.mod:
* gm2-compiler/M2Range.mod (gdbinit): New procedure.
(BreakWhenRangeCreated): Ditto.
(CheckBreak): Ditto.
(InitRange): Call CheckBreak.
(Init): Add gdbhook and initialize interactive watch point.
* gm2-compiler/SymbolTable.def (GetNthParamAnyClosest): New
procedure function.
* gm2-compiler/SymbolTable.mod (BreakSym): Remove constant.
(BreakSym): Add Variable.
(stop): Remove.
(gdbhook): New procedure.
(BreakWhenSymCreated): Ditto.
(CheckBreak): Ditto.
(NewSym): Call CheckBreak.
(Init): Add gdbhook and initialize interactive watch point.
(MakeProcedure): Replace guarded call to stop with CheckBreak.
(GetNthParamChoice): New procedure function.
(GetNthParamOrdered): Ditto.
(GetNthParamAnyClosest): Ditto.
(GetOuterModuleScope): Ditto.
gcc/testsuite/ChangeLog:
PR modula2/119914
* gm2/pim/fail/constintarraybyte.mod: New test.
Jan Hubicka [Wed, 23 Apr 2025 16:39:14 +0000 (18:39 +0200)]
Enable ip-cp cloning over non-hot edges
Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the
issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available
while it clones without feedback.
Consider:
__attribute__ ((used))
int a[1000];
__attribute__ ((noinline))
void
test2(int sz)
{
for (int i = 0; i < sz; i++)
a[i]++;
asm volatile (""::"m"(a));
}
__attribute__ ((noinline))
void
test1 (int sz)
{
for (int i = 0; i < 1000; i++)
test2(sz);
}
int main()
{
test1(1000);
return 0;
}
Here we want to clone call both test1 and test2 and specialize for 1000, but
ipa-cp will not do that, since it will skip call main->test1 as not hot since
it is called just once both with or without profile feedback.
In this simple testcase even without profile feedback we will track that main
is called once.
I think the testcase shows that hotness of call is not that relevant when
deciding whether we want to propagate constants across it. ipa-cp with IPA
profile can compute overall estimate of time saved (which is existing time
benefit computing time saved per invociation of the function multiplied by
number of executions) and see if result is big enough. An easy check is to
simply call maybe_hot_p on the resulting count.
So this patch makes ipa-cp to consider all calls sites except those known to be
unlikely executed (i.e. run 0 times in train run or known to lead to someting
bad) as interesting, which makes ipa-cp to propagate across them, find cloning
candidates and feed them into good_clonning_oppurtunity.
For this I added cs_interesting_for_ipcp_p which also attempts to do right
thing with partial training.
Now good_clonning_oppurtunity will currently return false, since it will figure
out that the call edge is not very frequent.
It already kind of knows that frequency of call instruction istself is not too
important, but instead of computing overall time saved, it tries to compare it
with param_ipa_cp_profile_count_base percentage of counts of call edges. I
think this is not very relevant since estimated time saved per call can be
large. So I dropped this logic and replaced it with simple use of overall
saved time.
Since ipa-cp is not dealing well with the cases where it hits the allowed unit
growth limit, we probably want to be more careful, so I keep existing metric
with this change.
So now we get:
Evaluating opportunities for test1/3.
- considering value 1000 for param #0 sz (caller_count: 1)
good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500
not cloning: time saved is not hot
good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500
First call to good_cloning_oppurtunity considers the case where only test1 is
clonned. In this case time saved is 1 (for passing the value around) and since
it is called just once (count_sum) overall time saved is 1 which is not
considered hot and we also get very low evaulation score.
In the second call we consider cloning chain test1->test2. In this case time
saved is large (12901) since test2 is invoked many times and it is used to
controll the loop. We still know that the count is 1 but overall time is
129001 which is already considered relevant and we clone.
I also try to do something sensible in case we have calls both with
and without IPA profile (which can happen for comdats where profile got missing
or with LTO if some units were not trained).
Instead of checking whether sum of calls with known profile is nonzero, I keep
track if there are other calls and if so, also try the local heuristics that
is used without profile feedback.
The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding
up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%).
We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange).
The main problem I see is that ipa-cp has the global limit for growth of 10%
but does not consider the oppurtunities in priority order. Consequently if the
limit is hit, randomly some clone oppurtunities are dropped in favour of
others.
I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get:
Small units can grow up to 16000 instructions and other units are
large. So there is only one 156% growth hititng limits which is exchange
that has recursive clonning that goes specially.
With profile feedback ipacp basically shuts itself off:
So here we get 114% and 127 growth in x264 (two differen tbinaries)
56% growht in Deepsjeng, 61% growth in Exchange which all are above
10% cutoff.
Bootstrapped/regtested x86_64-linux.
gcc/ChangeLog:
* ipa-cp.cc (base_count): Remove.
(struct caller_statistics): Rename n_hot_calls to n_interesting_calls;
add called_without_ipa_profile.
(init_caller_stats): Update.
(cs_interesting_for_ipcp_p): New function.
(gather_caller_stats): collect n_interesting_calls and
called_without_profile.
(ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot.
(good_cloning_opportunity_p): Rewrite heuristics when IPA profile is
present
(estimate_local_effects): Update.
(value_topo_info::propagate_effects): Update.
(compare_edge_profile_counts): Remove.
(ipcp_propagate_stage): Do not collect base_count.
(get_info_about_necessary_edges): Record whether function is called
without profile.
(decide_about_value): Update.
(ipa_cp_cc_finalize): Do not initialie base_count.
* profile-count.cc (profile_count::operator*): New.
(profile_count::operator*=): New.
* profile-count.h (profile_count::operator*): Declare
(profile_count::operator*=): Declare.
* params.opt: Remove ipa-cp-profile-count-base.
* doc/invoke.texi: Likewise.
Jan Hubicka [Wed, 23 Apr 2025 15:04:32 +0000 (17:04 +0200)]
Cost truth_value exprs in i386 vectorizer costs.
this patch implements costing of truth_value exprs. I.e.
a = b < c;
Those seems to be now the most common operations that goes to the addss path
except for in->fp and fp->int conversions.
For integer we use setcc, for FP there is CMccSS and variants which sets the
destination register a s a mast (i.e. -1 on true and 0 on false). Technically
these needs res&1 to get into 1 on true, 0 on false, but looking on examples
where this is used, it is common that the resulting code is optimized avoiding
need for this (except for cases wehre result is directly saved to memory).
For this reason I am accounting only one sse_op (CMccSS) itself.
Christophe Lyon [Fri, 14 Mar 2025 15:04:29 +0000 (15:04 +0000)]
testsuite: aarch64: arm: Enable vld1x?.c and vst1x?.c on arm [PR71233]
r14-7202-gc8ec3e1327cb1e added vld1xN and vst1xN intrinsics and some
tests on arm, but didn't enable some existing tests.
Since these tests are shared with aarch64, this patch removes the
'dg-skip-if "unimplemented" { arm*-*-* }' directives and relies on the
advsimd-intrinsics.exp driver to define the appropriate flags and
dg-do-what action. (A previous patch removed 'dg-do run', and this
patch removes 'dg-options "-O3"' which would override the options
computed by the test driver)
float16 intrinsics require the neon-fp16 FPU, which is possibly
enabled by advsimd-intrinsics.exp, so we include them unconditionally
on aarch64 or if fp16 is enabled on arm.
poly64 intrinsics would require crypto-neon-fp-armv8: the patch
enables the corresponding tests on aarch64 only, since for arm they
are already covered by other tests in gcc.target/arm/simd/. For some
reason, poly64 tests where missing from x2 and x3 tests, so the patch
adds them as needed.
Tested on aarch64-linux-gnu (no change), arm-linux-gnueabihf (the
additional tests are executed) and various flavors of arm-none-eabi
(the additional tests are compiled-only on M-profile, executed on
A-profile).
libstdc++: fix possible undefined atomic lock-free type aliases in module std
When building for 'i386-*' targets, all basic types are 'sometimes lock-free'
and thus std::atomic_signed_lock_free and std::atomic_unsigned_lock_free are
not declared. In the header <atomic>, they are placed in preprocessor
condition __cpp_lib_atomic_lock_free_type_aliases. In module std, they should
be the same.
libstdc++-v3/ChangeLog:
* src/c++23/std.cc.in (atomic_signed_lock_free): Guard with
preprocessor check for __cpp_lib_atomic_lock_free_type_aliases.
(atomic_unsigned_lock_free): Likewise.
liuhongt [Mon, 31 Mar 2025 03:15:41 +0000 (20:15 -0700)]
Accept allones or 0 operand for vcond_mask op1.
Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
or vpandn.
gcc/ChangeLog:
* config/i386/predicates.md (vector_or_0_or_1s_operand): New predicate.
(nonimm_or_0_or_1s_operand): Ditto.
* config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
Extend the predicate of operands1 to accept 0 or allones
operands.
(vcond_mask_<mode><sseintvecmodelower>): Ditto.
(vcond_mask_v1tiv1ti): Ditto.
(vcond_mask_<mode><sseintvecmodelower>): Ditto.
* config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
operands[3].
* config/i386/i386-expand.cc (ix86_expand_sse_fp_minmax):
Force immediate_operand to register.
gcc/testsuite/ChangeLog:
* gcc.target/i386/blendv-to-maxmin.c: New test.
* gcc.target/i386/blendv-to-pand.c: New test.
Jan Hubicka [Tue, 22 Apr 2025 21:47:14 +0000 (23:47 +0200)]
Fix vectorizer costs of COND_EXPR, MIN_EXPR, MAX_EXPR, ABS_EXPR, ABSU_EXPR
this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR,
MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and ABSU_EXPR
but it was only correct for FP variant (wehre it corresponds to andss clearing
sign bit). Integer abs/absu is open coded as conditinal move for SSE2 and
SSE3 instroduced an instruction.
MIN_EXPR/MAX_EXPR compiles to minss/maxss for FP and accroding to Agner Fog
tables they costs same as sse_op on all targets. Integer translated to single
instruction since SSE3.
COND_EXPR translated to open-coded conditional move for SSE2, SSE4.1 simplified
the sequence and AVX512 introduced masked registers.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Add special cases
for COND_EXPR; make MIN_EXPR, MAX_EXPR, ABS_EXPR and ABSU_EXPR more realistic.
Jakub Jelinek [Tue, 22 Apr 2025 19:27:28 +0000 (21:27 +0200)]
rs6000: Ignore OPTION_MASK_SAVE_TOC_INDIRECT differences in inlining decisions [PR119327]
The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
&& flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun))
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.
2025-04-22 Jakub Jelinek <jakub@redhat.com>
PR target/119327
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
OPTION_MASK_SAVE_TOC_INDIRECT differences.
This non-standard optimization breaks real-world code that expects the
result of std::projected to always (be a class type and) have a value_type
member, which isn't true for e.g. I=int*, so revert it for now.
Spencer Abson [Thu, 20 Mar 2025 12:18:57 +0000 (12:18 +0000)]
Induction vectorizer: prevent ICE for scalable types
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR
with step_vectype (a fix for pr103523). However, vectorizable_induction
can emit a vectorized MULT_EXPR when calculating the step of each IV for
SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float
inductions.
gcc/ChangeLog:
* tree-vect-loop.cc (vectorizable_induction): Add target support
checks for vectorized MULT_EXPR and FLOAT_EXPR where necessary for
scalable types.
Prefer target_supports_op_p over directly_supports_p for these tree
codes.
(vectorizable_nonlinear_induction): Fix a doc comment while I'm
here.