David Malcolm [Wed, 4 Dec 2024 22:34:27 +0000 (17:34 -0500)]
c++: give suggestion on misspelled class name [PR116771]
gcc/cp/ChangeLog:
PR c++/116771
* parser.cc (cp_parser_name_lookup_error): Provide suggestions for
the case of complete failure where there is no scope.
gcc/testsuite/ChangeLog:
PR c++/116771
* g++.dg/spellcheck-pr116771.C: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
sched1 computes ECC (Excess Change Cost) for each insn, which represents
the register pressure attributed to the insn.
Currently the pressure sensitive scheduling algorithm deliberately ignores
negative ECC values (pressure reduction), making them 0 (neutral), leading
to more spills. This happens due to the assumption that the compiler has
a reasonably accurate processor pipeline scheduling model and thus tries
to aggresively fill pipeline bubbles with spill slots.
This however might not be true, as the model might not be available for
certains uarches or even applicable especially for modern out-of-order cores.
The existing heuristic induces spill frenzy on RISC-V, noticably so on
SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
total dynamic icounts for this workload are reduced in half from
~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).
This patch adds --param=cycle-accurate-model={0,1} to gate the spill
behavior.
- The default (1) preserves existing spill behavior.
- targets/uarches sensitive to spilling can override the param to (0)
to get the reverse effect. RISC-V backend does so too.
The actual perf numbers are very promising.
(1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:
Before:
------
Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
gcc/ChangeLog:
PR target/11472
* params.opt (--param=cycle-accurate-model=): New opt.
* doc/invoke.texi (cycle-accurate-model): Document.
* haifa-sched.cc (model_excess_group_cost): Return negative
delta if param_cycle_accurate_model is 0.
(model_excess_cost): Ceil negative baseECC to 0 only if
param_cycle_accurate_model is 1.
Dump the actual ECC value.
* config/riscv/riscv.cc (riscv_option_override): Set param
to 0.
gcc/testsuite/ChangeLog:
PR target/114729
* gcc.target/riscv/riscv.exp: Enable new tests to build.
* gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.
Filip Kastl [Wed, 4 Dec 2024 14:46:54 +0000 (15:46 +0100)]
contrib: Fix 2 bugs in check-params-in-docs.py
In my last patch for check-params-in-docs.py I accidentally
1. left one occurence of the 'help_params' variable not renamed
2. converted 'help_params' from a dict to a list
These issues cause the script to error when encountering a parameter
missing in docs. This patch should fix these issues.
contrib/ChangeLog:
* check-params-in-docs.py: 'params' -> 'help_params'. Don't
convert 'help_params' to a list.
David Malcolm [Wed, 4 Dec 2024 13:40:34 +0000 (08:40 -0500)]
arm: use quotes when referring to command-line options [PR90160]
gcc/ChangeLog:
PR translation/90160
* config/arm/arm.cc (arm_option_check_internal): Use quotes in
messages that refer to command-line options. Tweak wording.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Simon Martin [Tue, 3 Dec 2024 13:30:43 +0000 (14:30 +0100)]
c++: Don't reject pointer to virtual method during constant evaluation [PR117615]
We currently reject the following valid code:
=== cut here ===
struct Base {
virtual void doit (int v) const {}
};
struct Derived : Base {
void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
fn_t mFn;
constexpr Helper (auto && fn) : mFn(static_cast<fn_t>(fn)) {}
};
void foo () {
constexpr Helper h (&Derived::doit);
}
=== cut here ===
The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.
This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.
PR c++/117615
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.
Jakub Jelinek [Wed, 4 Dec 2024 09:54:41 +0000 (10:54 +0100)]
c++: Fix up erroneous template error recovery ICE [PR117826]
The testcase in the PR (which can't be easily reduced and is
way too large and has way too many errors) results in an ICE,
because the erroneous_templates hash_map holds trees of erroneous
templates across ggc_collect and some of the templates in there
could be removed, so the later lookup can crash on comparison of
already freed and reused trees.
The following patch makes the hash_map GTY((cache)) marked.
The cp-tree.h changes before the erroneous_template declaration
are needed to make gengtype happy, it didn't like using
directive nor using a template-id as a template parameter.
It is marked cache because if a decl would be solely referenced from
the erroneous_templates hash_map, then nothing would look it up.
2024-12-04 Jakub Jelinek <jakub@redhat.com>
PR c++/117826
* cp-tree.h (struct decl_location_traits): New type.
(erroneous_templates_t): Change using into typedef.
(erroneous_templates): Add GTY((cache)).
* error.cc (cp_adjust_diagnostic_info): Use
hash_map_safe_get_or_insert<true> rather than
hash_map_safe_get_or_insert<false> for erroneous_templates.
Richard Biener [Tue, 3 Dec 2024 13:37:21 +0000 (14:37 +0100)]
tree-optimization/116083 - SLP discovery slowness
One large constant factor of SLP discovery is figuring the vector
type for each individual lane of each node. That should be redundant
since the structual comparison of stmts should ensure they end up
the same so the following computes them only once per node rather
than for each lane.
This cuts the compile-time of the testcase in half.
PR tree-optimization/116083
* tree-vect-slp.cc (vect_build_slp_tree_1): Compute vector
type and max_nunits only once. Remove check for matching
vector type of each lane and replace it with matching check
for LHS type.
Pan Li [Wed, 4 Dec 2024 05:53:52 +0000 (13:53 +0800)]
RISC-V: Add assert for insn operand out of range access [PR117878][NFC]
According to the the initial analysis of PR117878, the ice comes from
the out-of-range operand access for recog_data.operand[]. Thus, add
one assert here to expose this explicitly.
PR target/117878
gcc/ChangeLog:
* config/riscv/riscv-v.cc (vlmax_avl_type_p): Add assert for
out of range access.
(nonvlmax_avl_type_p): Ditto.
Andrew Pinski [Mon, 2 Dec 2024 16:35:23 +0000 (08:35 -0800)]
phiopt: Reset the number of iterations information of a loop when changing an exit from the loop [PR117243]
After r12-5300-gf98f373dd822b3, phiopt could get the following bb structure:
|
middle-bb -----|
| |
| |----| |
phi<1, 2> | |
cond | |
| | |
|--------+---|
Which was considered 2 loops. The inner loop had esimtate of upper_bound to be 8,
due to the original `for (b = 0; b <= 7; b++)`. The outer loop was already an
infinite one.
So phiopt would come along and change the condition to be unconditionally true,
we change the inner loop to being an infinite one but don't reset the estimate
on the loop and cleanup cfg comes along and changes it into one loop but also
does not reset the estimate of the loop. Then the loop unrolling uses the old estimate
and decides to add an unreachable there.o
So the fix is when phiopt changes an exit to a loop, reset the estimates, similar to
how cleanupcfg does it when merging some basic blocks.
The standard says that std::exclusive_scan can be used to work in
place, i.e. where the output range is the same as the input range. This
means that the first sum cannot be written to the output until after
reading the first input value, otherwise we'll already have overwritten
the first input value.
While writing a new testcase I also realised that the serial version of
std::exclusive_scan uses copy construction for the accumulator variable,
but the standard only requires Cpp17MoveConstructible. We also require
move assignable, which is missing from the standard's requirements, but
we should at least use move construction not copy construction.
A similar problem exists for some other new C++17 numeric algos, but
I'll fix the others in a subsequent commit.
libstdc++-v3/ChangeLog:
PR libstdc++/108236
* include/pstl/glue_numeric_impl.h (exclusive_scan): Pass __init
as rvalue.
* include/pstl/numeric_impl.h (__brick_transform_scan): Do not
write through __result until after reading through __first. Move
__init into return value.
(__pattern_transform_scan): Pass __init as rvalue.
* include/std/numeric (exclusive_scan): Move construct instead
of copy constructing.
* testsuite/26_numerics/exclusive_scan/2.cc: New test.
* testsuite/26_numerics/pstl/numeric_ops/108236.cc: New test.
Jonathan Wakely [Mon, 25 Nov 2024 21:55:09 +0000 (21:55 +0000)]
libstdc++: Simplify allocator propagation helpers using 'if constexpr'
Use diagnostic pragmas to allow using `if constexpr` in C++11 mode, so
that we don't need to use tag dispatching.
These helpers could be removed entirely by just using `if constexpr`
directly in the container member functions, but that's a slightly larger
change that can happen later.
It also looks like we could remove the __alloc_on_copy(const Alloc&)
overload, which is unused.
libstdc++-v3/ChangeLog:
* include/bits/alloc_traits.h (__do_alloc_on_copy): Remove.
(__do_alloc_on_move __do_alloc_on_swap): Remove.
(__alloc_on_copy, __alloc_on_move, __alloc_on_swap): Use if
constexpr.
Jonathan Wakely [Fri, 15 Nov 2024 21:45:16 +0000 (21:45 +0000)]
libstdc++: Add fancy pointer support to std::forward_list [PR57272]
This takes a very similar approach to the changes for std::list.
libstdc++-v3/ChangeLog:
PR libstdc++/57272
* include/bits/forward_list.h (_GLIBCXX_USE_ALLOC_PTR_FOR_LIST):
Define.
(_Fwd_list_node_base::_M_base_ptr): New member functions.
(_Fwd_list_node::_M_node_ptr): New member function.
(_Fwd_list_iterator, _Fwd_list_const_iterator): Make internal
member functions and data member private. Declare forward_list
and _Fwd_list_base as friends.
(__fwdlist::_Node_base, __fwdlist::_Node, __fwdlist::_Iterator):
New class templates.
(__fwdlist::_Node_traits): New class template.
(_Fwd_list_base): Use _Node_traits to get types. Use _Base_ptr
instad of _Fwd_list_node_base*. Use _M_base_ptr() instead of
taking address of head node.
(forward_list): Likewise.
(_Fwd_list_base::_M_get_node): Do not define for versioned
namespace.
(_Fwd_list_base::_M_put_node): Only convert pointer if needed.
(_Fwd_list_base::_M_create_node): Use __allocate_guarded_obj.
(_Fwd_list_base::_M_destroy_node): New member function.
* include/bits/forward_list.tcc (_Fwd_list_base::_M_insert_after)
(forward_list::_M_splice_after, forward_list::insert_after): Use
const_iterator::_M_const_cast() instead of casting pointers.
(_Fwd_list_base::_M_erase_after): Use _M_destroy_node.
(forward_list::remove, forward_list::remove_if): Only do
downcasts when accessing the value.
(forward_list::sort): Likewise.
* testsuite/23_containers/forward_list/capacity/1.cc: Check
max_size for new node type.
* testsuite/23_containers/forward_list/capacity/node_sizes.cc:
New test.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
New test.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
New test.
Jonathan Wakely [Fri, 15 Nov 2024 19:06:47 +0000 (19:06 +0000)]
libstdc++: Add fancy pointer support to std::list [PR57272]
Currently std::list uses raw pointers to connect its nodes, which is
non-conforming. We should use the allocator's pointer type everywhere
that a "pointer" is needed.
Because the existing types like _List_node<T> are part of the ABI now,
we can't change them. To support nodes that are connected by fancy
pointers we need a parallel hierarchy of node types. This change
introduces new class templates parameterized on the allocator's
void_pointer type, __list::_Node_base and __list::_Node_header, and new
class templates parameterized on the allocator's pointer type,
__list::Node, __list::_Iterator. The iterator class template is used for
both iterator and const_iterator. Whether std::list<T, A> should use the
old _List_node<T> or new _list::_Node<A::pointer> type family internally
is controlled by a new __list::_Node_traits traits template.
Because std::pointer_traits and std::__to_address are not defined for
C++98, there is no way to support fancy pointers in C++98. For C++98 the
_Node_traits traits always choose the old _List_node family.
In case anybody is currently using std::list with an allocator that has
a fancy pointer, this change would be an ABI break, because their
std::list instantiations would start to (correctly) use the fancy
pointer type. If the fancy pointer just contains a single pointer and so
has the same size, layout, and object represenation as a raw pointer,
the code might still work (despite being an ODR violation). But if their
fancy pointer has a different representation, they would need to
recompile all their code using that allocator with std::list. Because
std::list will never use fancy pointers in C++98 mode, recompiling
everything to use fancy pointers isn't even possible if mixing C++98 and
C++11 code that uses std::list. To alleviate this problem, compiling
with -D_GLIBCXX_USE_ALLOC_PTR_FOR_LIST=0 will force std::list to have
the old, non-conforming behaviour and use raw pointers internally. For
testing purposes, compiling with -D_GLIBCXX_USE_ALLOC_PTR_FOR_LIST=9001
will force std::list to always use the new node types. This macro is
currently undocumented, which needs to be fixed.
The original _List_node<T> type is trivially constructible and trivially
destructible, but the new __list::_Node<Ptr> type might not be,
depending on the fancy pointer data members in _Node_base. This means
that std::list needs to explicitly construct and destroy the node
object, not just the value that it contains. This commit adds a new
__allocated_obj helper which wraps an __allocated_ptr and additionally
constructs and destroys an object in the allocated storage.
Pretty printers for std::list need to be updated to handle the new node
types. Potentially we just can't pretty print them, because we don't
know how to follow the fancy pointers to traverse the list.
libstdc++-v3/ChangeLog:
PR libstdc++/57272
PR libstdc++/110952
* include/bits/allocated_ptr.h (__allocated_ptr::get): Add
const.
(__allocated_ptr::operator bool, __allocated_ptr::release): New
member functions.
(__allocate_guarded): Add inline.
(__allocated_obj): New class template.
(__allocate_guarded_obj): New function template.
* include/bits/list.tcc (_List_base::_M_clear()): Replace uses
of raw pointers. Use _M_destroy_node.
(list::emplace, list::insert): Likewise.
(list::sort): Adjust check for 0 or 1 wsize. Use template
argument list for _Scratch_list.
* include/bits/stl_list.h (_GLIBCXX_USE_ALLOC_PTR_FOR_LIST):
Define.
(_List_node_base::_Base_ptr): New typedef.
(_List_node_base::_M_base): New member functions.
(_List_node_header::_M_base): Make public and add
using-declaration for base class overload.
(__list::_Node_traits, __list::_Node_base)
(__list::_Node_header, __list::_Node, __list::_Iterator): New
class templates.
(_Scratch_list): Turn class into class template. Use _Base_ptr
typedef instead of _List_node_base*.
(_List_node::_Node_ptr): New typedef.
(_List_node::_M_node_ptr): New member function.
(_List_base, _List_impl): Use _Node_traits to get node types.
(_List_base::_M_put_node): Convert to fancy pointer if needed.
(_List_base::_M_destroy_node): New member function.
(_List_base(_List_base&&, _Node_alloc_type&&)): Use if constexpr
to make function a no-op for fancy pointers.
(_List_base::_S_distance, _List_base::_M_distance)
(_List_base::_M_node_count): Likewise.
(list): Use _Node_traits to get iterator, node and pointer
types.
(list::_M_create_node): Use _Node_ptr typedef instead of _Node*.
Use __allocate_guarded_obj instead of _M_get_node.
(list::end, list::cend, list::empty): Use node header's
_M_base() function instead of taking its address.
(list::swap): Use _Node_traits to get node base type.
(list::_M_create_node, list::_M_insert): Use _Node_ptr instead
of _Node*.
(list::_M_erase): Likewise. Use _M_destroy_node.
(__distance): Overload for __list::_Iterator.
(_Node_base::swap, _Node_base::_M_transfer): Define non-inline
member functions of class templates.
(_Node_header::_M_reverse): Likewise.
* testsuite/23_containers/list/capacity/29134.cc: Check max_size
for allocator of new node type.
* testsuite/23_containers/list/capacity/node_sizes.cc: New test.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
New test.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
New test.
Jonathan Wakely [Tue, 12 Nov 2024 15:36:17 +0000 (15:36 +0000)]
libstdc++: Refactor std::list::size() for cxx11 ABI
Remove some preprocessor conditionals by moving the _M_size member for
the cxx11 ABI into a new base class, which is empty for the gcc4-compat
ABI.
Move some unused members that are only retained for ABI compatibility to
the end of _List_base and add an explanatory comment. Stop using
list::_M_node_count and list::_D_distance and then move them to the end
of std::list with a comment too.
libstdc++-v3/ChangeLog:
* include/bits/stl_list.h (_List_size): New struct.
(_List_node_header): Replace _M_size member with _List_size base
class.
(_List_node_header(_List_node_header&&)): Replace explicit uses
of _M_size with initializing the base.
(_List_node_header::_M_init): Likewise.
(_List_base::_S_distance, _List_base::_M_distance)
(_List_base::_M_node_count): Move to end of class body and add
comment.
(list::_S_distance, list::_M_node_count): Likewise.
(list::size): Inline _M_node_count effects to here.
(list::splice(iterator, list&, iterator, iterator)): Use #if and
call std::distance instead of _S_distance.
Some diagnostics are issues late, e.g. in avr_print_operand().
This patch uses the insn's location as a proxy for the operand
location. Without the patch, the location is usually input_location,
which points to the closing } of the function body.
gcc/
* config/avr/avr.cc (avr_insn_location): New variable.
(avr_final_prescan_insn): Set avr_insn_location.
(avr_asm_final_postscan_insn): Unset avr_insn_location after last insn.
(avr_print_operand): Pass avr_insn_location to warning_at.
Jeff Law [Tue, 3 Dec 2024 19:26:01 +0000 (12:26 -0700)]
Move some CRC tests into the gcc.dg/torture directory
Jakub noted that these tests were using dg-skip-if directives that implied the
tests were expected to run under multiple optimization options, which means
they probably should be in gcc.dg/torture rather than in the gcc.dg directory.
This moves the relevant tests from gcc.dg to gcc.dg/torture.
Nina Ranns [Tue, 3 Dec 2024 14:58:21 +0000 (14:58 +0000)]
c++/contracts: ICE with contract assert on non-empty statement [PR 117579]
Contract assert is an attribute on an empty statement. Currently we assert
that the statement is empty before emitting the assertion. This has been
changed to a conditional check that the statement is empty before the
assertion is emitted.
PR c++/117579
gcc/cp/ChangeLog:
* parser.cc (cp_parser_statement): Replace assertion with a
conditional check that the statement containing a contract assert
is empty.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/pr117579.C: New test.
Signed-off-by: Nina Ranns <dinka.ranns@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
David Malcolm [Tue, 3 Dec 2024 18:53:46 +0000 (13:53 -0500)]
maintainer-scripts: build the libgdiagnostics docs for the website [PR117883]
maintainer-scripts/ChangeLog:
PR web/117883
* update_web_docs_git: Introduce SPHINX_VENV to make
it easier to test the script. Add the libgdiagnostics docs
and testsuite to the files to be preserved. Use sphinx to build
the libgdiagnostics docs as HTML. Copy them into $DOCSDIR.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 3 Dec 2024 18:53:42 +0000 (13:53 -0500)]
maintainer-scripts: fix jit docs on website
I noticed whilst working on the libgdiagnostics docs
that some errors like this were occurring in the jit docs:
/tmp/gcc-doc-update.3782849/gcc/gcc/jit/docs/cp/topics/asm.rst:63: WARNING: Include file '/tmp/gcc-doc-update.3782849/gcc/gcc/testsuite/jit.dg/test-asm.cc' not found or reading it failed
which was occurring for:
* test-asm.c and .cc
* test-switch.c
* test-accessing-union.c
and indeed https://gcc.gnu.org/onlinedocs/jit/topics/asm.html is
currently missing various code examples.
Fixed thusly; tested locally.
maintainer-scripts/ChangeLog:
* update_web_docs_git: Add the jit testsuite to the files to
be preserved, since this is used by the jit docs.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Edwin Lu [Tue, 3 Dec 2024 01:29:55 +0000 (17:29 -0800)]
RISC-V: Fix test target selector
The previous target selector was not properly gating the tests to rv32
and rv64 targets. This was triggering an excess failure on rv32 targets
where it would try to run the zbc64 tests. Fix selector
Paul Thomas [Tue, 3 Dec 2024 15:56:53 +0000 (15:56 +0000)]
Fortran: Fix class transformational intrinsic calls [PR102689]
2024-12-03 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/102689
* trans-array.cc (get_array_ref_dim_for_loop_dim): Use the arg1
class container carried in ss->info as the seed for a lhs in
class valued transformational intrinsic calls that are not the
rhs of an assignment. Otherwise, the lhs variable expression is
taken from the loop chain. For this latter case, the _vptr and
_len fields are set.
(gfc_trans_create_temp_array): Use either the lhs expression
seeds to build a class variable that will take the returned
descriptor as its _data field. In the case that the arg1 expr.
is used, 'atmp' must be marked as unused, a typespec built with
the correct rank and the _vptr and _len fields set. The element
size is provided for the temporary allocation and to set the
descriptor span.
(gfc_array_init_size): When an intrinsic type scalar expr3 is
used in allocation of a class array, use its element size in
the descriptor dtype.
* trans-expr.cc (gfc_conv_class_to_class): Class valued
transformational intrinsics return the pointer to the array
descriptor as the _data field of a class temporary. Extract
directly and return the address of the class temporary.
(gfc_conv_procedure_call): store the expression for the first
argument of a class valued transformational intrinsic function
in the ss info class_container field. Later, use its type as
the element type in the call to gfc_trans_create_temp_array.
(fcncall_realloc_result): Add a dtype argument and use it in
the descriptor, when available.
(gfc_trans_arrayfunc_assign): For class lhs, build a dtype with
the lhs rank and the rhs element size and use it in the call to
fcncall_realloc_result.
gcc/testsuite/
PR fortran/102689
* gfortran.dg/class_transformational_1.f90: New test for class-
valued reshape.
* gfortran.dg/class_transformational_2.f90: New test for other
class_valued transformational intrinsics.
Joseph Myers [Tue, 3 Dec 2024 13:01:58 +0000 (13:01 +0000)]
preprocessor: Adjust C rules on UCNs for C23 [PR117162]
As noted in bug 117162, C23 changed some rules on UCNs to match C++
(this was a late change agreed in the resolution to CD2 comment
US-032, implementing changes from N3124), which we need to implement.
Allow UCNs below 0xa0 outside identifiers for C, with a
pedwarn-if-pedantic before C23 (and a warning with -Wc11-c23-compat)
except for the always-allowed cases of UCNs for $ @ `. Also as part
of that change, do not allow \u0024 in identifiers as equivalent to $
for C23.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/117162
libcpp/
* include/cpplib.h (struct cpp_options): Add low_ucns.
* init.cc (struct lang_flags, lang_defaults): Add low_ucns.
(cpp_set_lang): Set low_ucns
* charset.cc (_cpp_valid_ucn): For C, allow UCNs below 0xa0
outside identifiers, with a pedwarn if pedantic before C23 or a
warning with -Wc11-c23-compat. Do not allow \u0024 in identifiers
for C23.
Richard Biener [Tue, 3 Dec 2024 07:56:35 +0000 (08:56 +0100)]
tree-optimization/117874 - optimize SLP discovery budget use
The following tries to avoid eating into the SLP discovery limit
when we can do cheaper checks first. Together with the previous
patch this allows to use two-lane SLP discovery for mult_su3_an
in 433.milc.
PR tree-optimization/117874
* tree-vect-slp.cc (vect_build_slp_tree_2): Perform early
reassoc checks before eating into discovery limit.
Richard Biener [Tue, 3 Dec 2024 07:52:48 +0000 (08:52 +0100)]
Use the number of relevant stmts to limit SLP build
The following removes scalar stmt counting from loop vectorization
and using that as base to limit both the SLP tree final size and
discovery. Instead use the number of relevant stmts for that
which is conveniently the number of stmt_vec_infos we create which
in turn includes things like pattern stmts.
PR tree-optimization/117874
* tree-vectorizer.h (vec_info_shared::n_stmts): Remove.
(LOOP_VINFO_N_STMTS): Likewise.
* tree-vectorizer.cc (vec_info_shared::vec_info_shared): Adjust.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Do not
count stmts.
(vect_analyze_loop_2): Adjust. Pass stmt_vec_info.length ()
to vect_analyze_slp as SLP tree size limit.
The previous version of the patch was based on the mistaken assumption that
features in /proc/cpuinfo had matching names to the feature names that gcc and
gas accept.
This patch enables the fp8 feature when the f8cvt feature is enabled, under the
assumption that fpmr is always enabled when f8cvt is.
Jonathan Wakely [Mon, 2 Dec 2024 15:13:52 +0000 (15:13 +0000)]
libstdc++: Make std::vector<bool> constructor noexcept (LWG 3778)
LWG 3778 was approved in November 2022. We already implement all the
changes except for one, which this commit does.
The new test verifies all the changes from LWG 3778, not just the one
implemented here.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (vector(const allocator_type&)):
Add noexcept, as per LWG 3778.
* testsuite/23_containers/vector/bool/cons/lwg3778.cc: New test.
Jakub Jelinek [Tue, 3 Dec 2024 10:17:49 +0000 (11:17 +0100)]
tree-ssanames, match.pd: get_nonzero_bits/with_*_nonzero_bits* cleanups and improvements [PR117420]
The following patch implements the with_*_nonzero_bits* cleanups and
improvements I was talking about.
get_nonzero_bits is extended to also handle BIT_AND_EXPR (as a tree or
as SSA_NAME with BIT_AND_EXPR def_stmt), new function is added for the
bits known to be set (get_known_nonzero_bits) and the match.pd predicates
are renamed and adjusted, so that there is no confusion on which one to
use (one is named and documented to be internal), changed so that it can be
used only as a simple predicate, not match some operands, and that it doesn't
try to match twice for the GIMPLE case (where SSA_NAME with integral or pointer
type matches, but SSA_NAME with BIT_AND_EXPR def_stmt matched differently).
Furthermore, get_nonzero_bits just returns the all bits set (or
get_known_nonzero_bits no bits set) fallback if the argument isn't a
SSA_NAME (nor INTEGER_CST or whatever the functions handle explicitly).
2024-12-03 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117420
* tree-ssanames.h (get_known_nonzero_bits): Declare.
* tree-ssanames.cc (get_nonzero_bits): New wrapper function. Move old
definition to ...
(get_nonzero_bits_1): ... here, add static. Change widest_int in
function comment to wide_int.
(get_known_nonzero_bits_1, get_known_nonzero_bits): New functions.
* match.pd (with_possible_nonzero_bits2): Rename to ...
(with_possible_nonzero_bits): ... this. Guard the bit_and case with
#if GENERIC. Change to a normal match predicate without parameters.
Rename the old with_possible_nonzero_bits match to ...
(with_possible_nonzero_bits_1): ... this.
(with_certain_nonzero_bits2): Remove.
(with_known_nonzero_bits_1, with_known_nonzero_bits): New match
predicates.
(X == C (or X & Z == Y | C) is impossible if ~nonzero(X) & C != 0):
Use with_known_nonzero_bits@0 instead of
(with_certain_nonzero_bits2 @1), use with_possible_nonzero_bits@0
instead of (with_possible_nonzero_bits2 @0) and
get_known_nonzero_bits (@1) instead of wi::to_wide (@1).
Jakub Jelinek [Tue, 3 Dec 2024 10:16:37 +0000 (11:16 +0100)]
bitintlower: Fix up ?ROTATE_EXPR lowering [PR117847]
In the ?ROTATE_EXPR lowering I forgot to handle rotation by 0 correctly.
INTEGER_CST 0 is very unlikely, it would be probably folded away, but
a non-constant count can't use just p - n because then the shift count
is out of bounds for zero.
In the FE I use n == 0 ? x : (x << n) | (x >> (p - n)) but bitintlower
here isn't prepared at this point to have bb split and am not sure if
using COND_EXPR is a good idea either, so the patch uses (p - n) % p.
Perhaps I should just disable lowering the rotate in the FE for the
non-mode precision BITINT_TYPEs too.
2024-12-03 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117847
* gimple-lower-bitint.cc (gimple_lower_bitint) <case LROTATE_EXPR>:
Use m = (p - n) % p instead of m = p - n for the other shift count.
Tobias Burnus [Tue, 3 Dec 2024 10:02:03 +0000 (11:02 +0100)]
OpenMP: 'allocate' directive - fixes for 'alignof' and [[omp::decl]]
Fixed a check to permit [[omp::decl(allocate,...)]] parsing in C.
Additionaly, we discussed that 'allocate align' should not affect
'alignof' to avoid issues like with:
int a;
_Alignas(_Alignof(a)) int b;
#pragma omp allocate(a) align(128)
_Alignas(_Alignof(a)) int c;
Thus, the alignment is no longer set in the C and Fortran front ends,
but for static variables now in varpool_node::finalize_decl.
(For stack variables, the alignment is handled in gimplify_bind_expr.)
NOTE: 'omp allocate' is not yet supported in C++.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_allocate): Only check scope if
not in_omp_decl_attribute. Remove setting the alignment.
gcc/ChangeLog:
* cgraphunit.cc (varpool_node::finalize_decl): Set alignment
based on OpenMP's 'omp allocate' attribute/directive.
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_finish_var_decl): Remove setting the alignment.
libgomp/ChangeLog:
* libgomp.texi (Memory allocation): Mention (non-)effect of 'align'
on _Alignof.
* testsuite/libgomp.c/allocate-7.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/allocate-18.c: Check that alignof is unaffected
by 'omp allocate'.
* c-c++-common/gomp/allocate-19.c: Likewise.
aarch64: Add flags field to aarch64-simd-pragma-builtins.def
This patch adds a flags field to aarch64-simd-pragma-builtins.def
and uses it to add attributes to the function declaration.
gcc/
* config/aarch64/aarch64-simd-pragma-builtins.def: Add a flags
field to each entry.
* config/aarch64/aarch64-builtins.cc: Update includes accordingly.
(aarch64_pragma_builtins_data): Add a flags field.
(aarch64_init_pragma_builtins): Use the flags field to add attributes
to the function declaration.
Saurabh Jha [Tue, 3 Dec 2024 09:54:01 +0000 (09:54 +0000)]
aarch64: Add support for AdvSIMD lut
The AArch64 FEAT_LUT extension is optional from Armv9.2-A and mandatory
from Armv9.5-A. It introduces instructions for lookup table reads with
bit indices.
This patch adds support for AdvSIMD lut intrinsics. The intrinsics for
this extension are implemented as the following builtin functions:
* vluti2{q}_lane{q}_{u8|s8|p8}
* vluti2{q}_lane{q}_{u16|s16|p16|f16|bf16}
* vluti4q_lane{q}_{u8|s8|p8}
* vluti4q_lane{q}_{u16|s16|p16|f16|bf16}_x2
We also introduced a new approach to do lane checks for AdvSIMD.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(aarch64_builtin_signatures): Add binary_lane.
(aarch64_fntype): Handle it.
(simd_types): Add 16-bit x2 types.
(aarch64_pragma_builtins_checker): New class.
(aarch64_general_check_builtin_call): Use it.
(aarch64_expand_pragma_builtin): Add support for lut unspecs.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Add lut option.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_BINARY_LANE): Modify to use new ENTRY macro.
(ENTRY_TERNARY_VLUT8): Macro to declare lut intrinsics.
(ENTRY_TERNARY_VLUT16): Macro to declare lut intrinsics.
(REQUIRED_EXTENSIONS): Declare lut intrinsics.
* config/aarch64/aarch64-simd.md
(@aarch64_<vluti_uns_op><VLUT:mode><VB:mode>): Instruction
pattern for luti2 and luti4 intrinsics.
(@aarch64_lutx2<VLUT:mode><VB:mode>): Instruction pattern for
luti4x2 intrinsics.
* config/aarch64/aarch64.h
(TARGET_LUT): lut flag.
* config/aarch64/iterators.md: Iterators and attributes for lut.
* doc/invoke.texi: Document extension in AArch64 Options.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/lut-incorrect-range.c: New test.
* gcc.target/aarch64/simd/lut-no-flag.c: New test.
* gcc.target/aarch64/simd/lut.c: New test.
Co-authored-by: Vladimir Miloserdov <vladimir.miloserdov@arm.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Saurabh Jha [Tue, 3 Dec 2024 09:54:00 +0000 (09:54 +0000)]
aarch64: Refactor AdvSIMD intrinsics
Refactor AdvSIMD intrinsics defined using the new pragma-based approach
so that it is more extensible.
Introduce a new struct, simd_type, which defines types using a mode and
qualifiers, and use objects of this struct in the declaration of intrinsics
in the aarch64-simd-pragma-builtins.def file.
Change aarch64_pragma_builtins_data struct to support return type and
argument types.
Refactor aarch64_fntype and aarch64_expand_pragma_builtin so that it
initialises corresponding vectors in a loop. As we add intrinsics with
more arguments, these functions won't need to change to support those.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(ENTRY): Modify to add support of return and argument types.
(struct simd_type): New struct to declare types using mode and
qualifiers.
(struct aarch64_pragma_builtins_data): Replace mode with
the array of types to support return and argument types.
(aarch64_fntype): Modify to handle different signatures.
(aarch64_expand_pragma_builtin): Modify to handle different
signatures.
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_VHSDF): Rename to ENTRY_BINARY_VHSDF.
(ENTRY_BINARY): New macro to declare binary intrinsics.
(ENTRY_BINARY_VHSDF): Remove signature argument and use
ENTRY_BINARY.
Co-authored-by: Vladimir Miloserdov <vladimir.miloserdov@arm.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
We had a function called aarch64_vq_mode, where "vq" stood for "vector
quadword". It was used by aarch64_simd_container_mode (from which it
originated) and in preparation for various SVE ...Q instructions.
It's useful for follow-on patches if we also split out the handling
of 64-bit modes from aarch64_simd_container_mode. Keeping to the
same naming scheme would replace "q" with "d", but that has
unfortunate connotations, and doesn't AFAIK correspond to any
actual SVE mnemonics.
This patch therefore splits the handling out into a function called
aarch64_v64_mode and renames aarch64_vq_mode to aarch64_v128_mode for
consistency. I didn't rename the "vq" local variables, since I think
those names make sense in context.
aarch64: Move some diagnostic functions to aarch64.cc
Some of the diagnostics reported for SVE builtins would also be
useful for Advanced SIMD builtins, so this patch moves them from
aarch64-sve-builtins.cc to aarch64.cc. I put them in a new aarch64
namespace for now -- perhaps in future they should be generic.
gcc/
* config/aarch64/aarch64-sve-builtins.cc (report_non_ice)
(report_out_of_range, report_neither_nor, report_not_one_of)
(report_not_enum): Move to...
* config/aarch64/aarch64.cc: ...here, putting them in the aarch64
namespace, and...
* config/aarch64/aarch64-protos.h: ...declare them here.
Pan Li [Fri, 29 Nov 2024 12:33:19 +0000 (20:33 +0800)]
Match: Refactor the unsigned SAT_SUB match patterns [NFC]
This patch would like to refactor the all unsigned SAT_SUB patterns, aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of unsigned SAT_SUB match patterns.
Pan Li [Tue, 3 Dec 2024 06:08:07 +0000 (14:08 +0800)]
RISC-V: Fix incorrect optimization options passing to reduc and ternop
Like the strided load/store, the testcases of vector reduce and ternop
are designed to pick up different sorts of optimization options but
actually these option are ignored according to the Execution log of
the gcc.log.
This patch would like to make it correct almost the same as what we
fixed for strided load/store.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.
Heiko Eißfeldt [Tue, 3 Dec 2024 08:47:59 +0000 (09:47 +0100)]
replace atoi with strtoul in varasm.cc (decode_reg_name_and_count) [PR114540]
The function uses atoi, which can silently return valid numbers even for
some too large numbers in the string.
Furthermore, the verification that all the characters in asmspec are
decimal digits can be simplified when using strotoul, we can check just
the first digit and whether the end pointer points to '\0'.
2024-12-03 Heiko Eißfeldt <heiko@hexco.de>
PR middle-end/114540
* varasm.cc (decode_reg_name_and_count): Use strtoul instead of atoi
and simplify verification that the whole asmspec contains just decimal
digits.
* gcc.dg/pr114540.c: New test.
Signed-off-by: Heiko Eißfeldt <heiko@hexco.de> Co-authored-by: Jakub Jelinek <jakub@redhat.com>
With SLP forced we fail to consider using single-lane SLP for a case
that we still end up discovering as hybrid (in the PR in question
this is because we run into the SLP discovery limit due to excessive
association).
Pan Li [Mon, 2 Dec 2024 13:57:53 +0000 (21:57 +0800)]
RISC-V: Fix incorrect optimization options passing to cond and builtin
Like the strided load/store, the testcases of vector cond and builtin are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we
fixed for strided load/store.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.
These files only still exist upstream; they should have been removed as
part of commit 104cc285533e742726ae18a7d3d4f384dd20c350
"gccrs: Refactor TypeResolution to be a simple query based system".
Jonathan Wakely [Thu, 28 Nov 2024 12:32:59 +0000 (12:32 +0000)]
libstdc++: Simplify std::_Destroy using 'if constexpr'
This is another place where we can use 'if constexpr' to replace
dispatching to a specialized class template, improving compile times and
avoiding a function call.
libstdc++-v3/ChangeLog:
* include/bits/stl_construct.h (_Destroy(FwdIter, FwdIter)): Use
'if constexpr' instead of dispatching to a member function of a
class template.
(_Destroy_n(FwdIter, Size)): Likewise.
(_Destroy_aux, _Destroy_n_aux): Only define for C++98.
Patrick Palka [Mon, 2 Dec 2024 15:58:50 +0000 (10:58 -0500)]
c++: some further concepts cleanups
This patch further cleans up the concepts code following the removal of
Concepts TS support:
* concept-ids are now the only kind of "concept check", so we can
simplify some code accordingly. In particular resolve_concept_check
seems like a no-op and can be removed.
* In turn, deduce_constrained_parameter doesn't seem to do anything
interesting.
* In light of the above we might as well inline finish_type_constraints
into its only caller.
* Introduce and use a helper for obtaining the prototype parameter of
a concept, i.e. its first template parameter.
* placeholder_extract_concept_and_args is only ever called on a
concept-id, so it's simpler to inline it into its callers.
* There's no such thing as a template-template-parameter with a
type-constraint, so we can remove such handling from the parser.
This means is_constrained_parameter is currently equivalent to
declares_constrained_type_template_parameter, so let's prefer
to use the latter.
* Remove WILDCARD_DECL and instead use the concept's prototype parameter
as the dummy first argument of a type-constraint during template
argument coercion.
* Remove a redundant concept_definition_p overload.
gcc/cp/ChangeLog:
* constraint.cc (resolve_concept_check): Remove.
(deduce_constrained_parameter): Remove.
(finish_type_constraints): Inline into its only caller
cp_parser_placeholder_type_specifier and remove.
(build_concept_check_arguments): Coding style tweaks.
(build_standard_check): Inline into its only caller ...
(build_concept_check): ... here.
(build_type_constraint): Use the prototype parameter as the
first template argument.
(finish_shorthand_constraint): Remove function concept
handling. Use concept_prototype_parameter.
(placeholder_extract_concept_and_args): Inline into its
callers and remove.
(equivalent_placeholder_constraints): Adjust after
placeholder_extract_concept_and_args removal.
(iterative_hash_placeholder_constraint): Likewise.
* cp-objcp-common.cc (cp_common_init_ts): Remove WILDCARD_DECL
handling.
* cp-tree.def (WILDCARD_DECL): Remove.
* cp-tree.h (WILDCARD_PACK_P): Remove.
(type_uses_auto_or_concept): Remove declaration of nonexistent
function.
(append_type_to_template_for_access_check): Likewise.
(finish_type_constraints): Remove declaration.
(placeholder_extract_concept_and_args): Remove declaration.
(deduce_constrained_parameter): Remove declaration.
(resolve_constraint_check): Remove declaration.
(valid_requirements_p): Remove declaration of nonexistent
function.
(finish_concept_name): Likewise.
(concept_definition_p): Remove redundant overload.
(concept_prototype_parameter): Define.
* cxx-pretty-print.cc (pp_cxx_constrained_type_spec): Adjust
after placeholder_extract_concept_and_args.
* error.cc (dump_decl) <case WILDCARD_DECL>: Remove.
(dump_expr) <case WILDCARD_DECL>: Likewise.
* parser.cc (is_constrained_parameter): Inline into
declares_constrained_type_template_parameter and remove.
(cp_parser_check_constrained_type_parm): Declare static.
(finish_constrained_template_template_parm): Remove.
(cp_parser_constrained_template_template_parm): Remove.
(finish_constrained_parameter): Remove dead code guarded by
cp_parser_constrained_template_template_parm.
(declares_constrained_type_template_parameter): Adjust after
is_constrained_parameter removal.
(declares_constrained_template_template_parameter): Remove.
(cp_parser_placeholder_type_specifier): Adjust after
finish_type_constraints removal. Check the prototype parameter
earlier, before build_type_constraint.
Use concept_prototype_parameter.
(cp_parser_parameter_declaration): Remove dead code guarded by
declares_constrained_template_template_parameter.
* pt.cc (convert_wildcard_argument): Remove.
(convert_template_argument): Remove WILDCARD_DECL handling.
(coerce_template_parameter_pack): Likewise.
(tsubst) <case TEMPLATE_TYPE_PARM>: Likewise.
(type_dependent_expression_p): Likewise.
(make_constrained_placeholder_type): Remove function concept
handling.
(placeholder_type_constraint_dependent_p): Remove WILDCARD_DECL
handling.
Andreas Schwab [Thu, 21 Nov 2024 14:35:01 +0000 (15:35 +0100)]
m68k: don't allow o/o in movdi, movdf, movxf
The movdi, movdf and movxf patterns allow both operands to be offsettable
memory, but output_move_double cannot handle overlapping objects. This is
visible in the failure of gcc.c-torture/execute/pr97073.c when compiled
with LTO (where cprop optimizes out the AND operation; the failure also
occurs without LTO when the AND is removed). Split the constraints so
that the operands cannot both be "o" in the same insn.
* config/m68k/m68k.md (movdi+1, movdf+1, movxf+2): Split
constraints so that the operands cannot both be "o".
Jakub Jelinek [Mon, 2 Dec 2024 13:51:57 +0000 (14:51 +0100)]
Add trailing newlines where needed
Especially in the recent CRC commits, I see
\ No newline at end of file
in almost every second file. So, I went through
the diff between r15-1 and current trunk in gcc/, looking for
additions of such problems which don't intentional (e.g.
Wtrailing-whitespace* tests had it there intentionally) and
just added the missing newline elsewhere.
Andre Vieira [Mon, 2 Dec 2024 13:35:03 +0000 (13:35 +0000)]
arm, mve: Adding missing Runtime Library Exception to header files
Add missing Runtime Library Exception to mve header files to bring them into
line with other similar headers. Not adding it in the first place was an
oversight.
Richard Biener [Mon, 2 Dec 2024 10:07:46 +0000 (11:07 +0100)]
tree-optimization/116352 - SLP scheduling and stmt order
The PR uncovers unchecked constraints on the ability to code-generate
with SLP but also latent issues with regard to stmt order checking
since loop (early-break) and BB (for quite some time) vectorization
are no longer constraint to single-BBs. In particular get_later_stmt
simply compares UIDs of stmts, but that's only reliable when they
are in the same BB.
For the PR in question the problematical case is demoting a SLP node
to external which fails to check we can actually code generate this
in the way we do (using get_later_stmt). The following thus adds
checking that we demote to external only when all defs are from
the same BB.
We no longer vectorize gcc.dg/vect/bb-slp-49.c but the testcase was
for a wrong-code issue and the vectorization done is a no-op.
Jakub Jelinek [Mon, 2 Dec 2024 12:55:02 +0000 (13:55 +0100)]
testsuite: Adjust rs6000-ldouble-2.c for switch to -std=gnu23 by default [PR117663]
-std=gnu23/-std=c23 changes LDBL_EPSILON for IBM long double, see r13-3029 and
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602738.html
for details.
That change even had a note:
"and when we move to a C2x
default, gcc.target/powerpc/rs6000-ldouble-2.c will need an
appropriate option added to keep using an older language version"
The following patch just implements it to fix rs6000-ldouble-2.c regression.
2024-12-02 Jakub Jelinek <jakub@redhat.com>
PR testsuite/117663
* gcc.target/powerpc/rs6000-ldouble-2.c: Add -std=gnu17 to dg-options.
yulong [Mon, 2 Dec 2024 01:31:53 +0000 (09:31 +0800)]
RISC-V: Add intrinsics support for SiFive Xsfvfnrclipxfqf extensions.
This commit adds intrinsics support for XXsfvfnrclipxfqf. We also redefine
the enum type frm_op_type in riscv-vector-builtins-bases.h file, because it
be used in sifive-vector-builtins-bases.cc file.
Pan Li [Fri, 29 Nov 2024 03:57:34 +0000 (11:57 +0800)]
RISC-V: Fix incorrect optimization options passing to widden
Like the strided load/store, the testcases of vector widen are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we fixed for
strided load/store.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.
This patch would like to fix the testcases failures of strided
load/store after sorts of optimization option passing to testcase.
* Add no strict align for vector option.
* Adjust dg-final by any-opts and/or no-opts if the rtl dump changes
on different optimization options (like O2, O3, zvl).
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Slava Barinov [Sun, 1 Dec 2024 18:59:13 +0000 (11:59 -0700)]
[PATCH] gcc: configure: Fix the optimization flags cleanup
Currently sed command in flag cleanup removes all the -O[0-9] flags, ignoring
the context. This leads to issues when the optimization flags is passed to
linker:
CFLAGS="-Os -Wl,-O1 -Wl,--hash-style=gnu"
is converted into
CFLAGS="-Os -Wl,-Wl,--hash-style=gnu"
Which leads to configure failure with ld: unrecognized option '-Wl,-Wl'.
gcc/
* configure.ac: Only remove -O[0-9] if not preceded with comma
* configure: Regenerated
Jovan Vukic [Sun, 1 Dec 2024 18:57:41 +0000 (11:57 -0700)]
Thanks for the feedback on the first version of the patch. Accordingly:
I have corrected the code formatting as requested. I added new tests to
the existing file phi-opt-11.c, instead of creating a new one.
I performed testing before and after applying the patch on the x86
architecture, and I confirm that there are no new regressions.
The logic and general code of the patch itself have not been changed.
> So the A EQ/NE B expression, we can reverse A and B in the expression
> and still get the same result. But don't we have to be more careful for
> the TRUE/FALSE arms of the ternary? For BIT_AND we need ? a : b for
> BIT_IOR we need ? b : a.
>
> I don't see that gets verified in the existing code or after your
> change. I suspect I'm just missing something here. Can you clarify how
> we verify that BIT_AND gets ? a : b for the true/false arms and that
> BIT_IOR gets ? b : a for the true/false arms?
I did not communicate this clearly last time, but the existing optimization
simplifies the expression "(cond & (a == b)) ? a : b" to the simpler "b".
Similarly, the expression "(cond & (a == b)) ? b : a" simplifies to "a".
Thus, the existing and my optimization perform the following
simplifications:
(cond & (a == b)) ? a : b -> b
(cond & (a == b)) ? b : a -> a
(cond | (a != b)) ? a : b -> a
(cond | (a != b)) ? b : a -> b
For this reason, for BIT_AND_EXPR when we have A EQ B, it is sufficient to
confirm that one operand matches the true/false arm and the other matches
the false/true arm. In both cases, we simplify the expression to the third
operand of the ternary operation (i.e., OP0 ? OP1 : OP2 simplifies to OP2).
This is achieved in the value_replacement function after successfully
setting the value of *code within the rhs_is_fed_for_value_replacement
function to EQ_EXPR.
For BIT_IOR_EXPR, the same check is performed for A NE B, except now
*code remains NE_EXPR, and then value_replacement returns the second
operand (i.e., OP0 ? OP1 : OP2 simplifies to OP1).
2024-10-30 Jovan Vukic <Jovan.Vukic@rt-rk.com>
gcc/ChangeLog:
* tree-ssa-phiopt.cc (rhs_is_fed_for_value_replacement): Add a new
optimization opportunity for BIT_IOR_EXPR and a != b.
(operand_equal_for_value_replacement): Ditto.
Mariam Arutunian [Mon, 11 Nov 2024 20:01:19 +0000 (13:01 -0700)]
[PATCH v7 11/12] Replace the original CRC loops with a faster CRC calculation
After the loop exit an internal function call (CRC, CRC_REV) is added, and its
result is assigned to the output CRC variable (the variable where the
calculated CRC is stored after the loop execution). The removal of the loop is
left to CFG cleanup and DCE.
gcc/
* gimple-crc-optimization.cc (optimize_crc_loop): New function.
(execute): Add optimize_crc_loop function call.
Mariam Arutunian [Mon, 11 Nov 2024 20:00:37 +0000 (13:00 -0700)]
[PATCH v7 10/12] Verify detected CRC loop with symbolic execution and LFSR matching
Symbolically execute potential CRC loops and check whether the loop actually
calculates CRC (uses LFSR matching). Calculated CRC and created LFSR are
compared on each iteration of the potential CRC loop.
gcc/
* Makefile.in (OBJS): Add crc-verification.o.
* crc-verification.cc: New file.
* crc-verification.h: New file.
* gimple-crc-optimization.cc (loop_calculates_crc): New function.
(is_output_crc): Likewise.
(swap_crc_and_data_if_needed): Likewise.
(validate_crc_and_data): Likewise.
(optimize_crc_loop): Likewise.
(get_output_phi): Likewise.
(execute): Add check whether potential CRC loop calculates CRC.
* sym-exec/sym-exec-state.cc (create_reversed_lfsr): New function.
(create_forward_lfsr): Likewise.
(last_set_bit): Likewise.
(create_lfsr): Likewise.
* sym-exec/sym-exec-state.h (is_bit_vector): Reorder, make the function public and static.
(create_reversed_lfsr) New static function declaration.
(create_forward_lfsr) New static function declaration.
Gives an opportunity to execute the code on bit level, assigning
symbolic values to the variables which don't have initial values.
Supports only CRC specific operations.
Example:
uint8_t crc;
uint8_t pol = 1;
crc = crc ^ pol;
during symbolic execution crc's value will be:
crc(8), crc(7), ... crc(1), crc(0) ^ 1
gcc/
* Makefile.in (OBJS): Add sym-exec/sym-exec-expression.o,
sym-exec/sym-exec-state.o, sym-exec/sym-exec-condition.o.
* configure (sym-exec): New subdir.
* sym-exec/sym-exec-condition.cc: New file.
* sym-exec/sym-exec-condition.h: New file.
* sym-exec/sym-exec-expr-is-a-helper.h: New file.
* sym-exec/sym-exec-expression.cc: New file.
* sym-exec/sym-exec-expression.h: New file.
* sym-exec/sym-exec-state.cc: New file.
* sym-exec/sym-exec-state.h: New file.
Mariam Arutunian [Mon, 11 Nov 2024 19:59:04 +0000 (12:59 -0700)]
[PATCH v7 08/12] Add a new pass for naive CRC loops detection
This patch adds a new compiler pass aimed at identifying naive CRC
implementations, characterized by the presence of a loop calculating
a CRC (polynomial long division). Upon detection of a potential CRC,
the pass prints an informational message.
Performs CRC optimization if optimization level is >= 2 and if
fno_gimple_crc_optimization given.
This pass is added for the detection and optimization of naive CRC
implementations, improving the efficiency of CRC-related computations.
This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be
provided in subsequent patches.
gcc/
* Makefile.in (OBJS): Add gimple-crc-optimization.o.
* common.opt (foptimize-crc): New option.
* common.opt.urls: Regenerate to add foptimize-crc.
* doc/invoke.texi (-foptimize-crc): Add documentation.
* gimple-crc-optimization.cc: New file.
* opts.cc (default_options_table): Add OPT_foptimize_crc.
(enable_fdo_optimizations): Enable optimize_crc.
* passes.def (pass_crc_optimization): Add new pass.
* timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
* tree-pass.h (make_pass_crc_optimization): New extern function
declaration.
Mark Harmstone [Thu, 7 Nov 2024 03:59:18 +0000 (03:59 +0000)]
Write binary annotations for CodeView S_INLINESITE symbols
Add "binary annotations" at the end of CodeView S_INLINESITE symbols,
which are a series of compressed integers that represent how line
numbers map to addresses.
This requires assembler support; you will need commit b3aa594d ("gas:
add .cv_ucomp and .cv_scomp pseudo-directives") in binutils.
gcc/
* configure.ac (HAVE_GAS_CV_UCOMP): New check.
* configure: Regenerate.
* config.in: Regenerate.
* dwarf2codeview.cc (enum binary_annotation_opcode): Define.
(struct codeview_function): Add htab_next and inline_loc;
(struct cv_func_hasher): Define.
(cv_func_htab): New global variable.
(new_codeview_function): Add new codeview_function to hash table.
(codeview_begin_block): Record location of inline block.
(codeview_end_block): Add dummy source line at end of inline block.
(find_line_function): New function.
(write_binary_annotations): New function.
(write_s_inlinesite): Call write_binary_annotations.
(codeview_debug_finish): Delete cv_func_htab.
testsuite: Silence gcc.dg/pr117806.c for default_packed
On default_packed targets like PRU, spurious warnings are emitted:
...workspace/gcc/gcc/testsuite/gcc.dg/pr117806.c:5:3: warning: 'packed' attribute ignored for field of type 'double' [-Wattributes]
Fix by annotating the excess warnings for default_packed targets.
gcc/testsuite/ChangeLog:
* gcc.dg/pr117806.c: Test can spill excess
errors for default_packed targets.
Andrew Pinski [Sat, 30 Nov 2024 22:09:48 +0000 (14:09 -0800)]
VN: Don't recurse on for the same value of `a != 0` [PR117859]
Like r15-5063-g6e84a41622f56c, but this is for the `a != 0` case.
After adding vn_valueize to the handle the `a ==/!= 0` case
of insert_predicates_for_cond, it would go into an infinite loop
as the Value number for a could be the same as what it
is for the whole expression. This avoids that recursion so there is
no infinite loop here.
Note lim was introducing `bool_var2 = bool_var1 != 0` originally but
with the gimple testcase in -2, there is no dependency on what passes
before hand will do.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117859
gcc/ChangeLog:
* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
valueization for the new lhs for `lhs != 0`
is the same as the old ones, don't recurse.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr117859-1.c: New test.
* gcc.dg/torture/pr117859-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 30 Nov 2024 21:12:13 +0000 (13:12 -0800)]
gimple-lim: Reuse boolean var when moving PHI
While looking into PR 117859, I noticed that LIM
sometimes would produce `bool_var2 = bool_var1 != 0` instead
of just using bool_var2. This patch allows LIM to reuse bool_var1
in the place where bool_var2 was going to be used.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-loop-im.cc (move_computations_worker): While moving
phi, reuse the lhs of the conditional if it is a boolean type.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 05:11:42 +0000 (21:11 -0800)]
testsuite: Fix aarch64/sve/acle/general-c/gnu_vectors_[12].c for taking address of vector element
After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get
each of the elements like it was an array. There is no reason why taking the address of that
won't be invalid too. especially since we are limiting to the first N elements (where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.
Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_1.c: Remove
error message on taking address of an element of a vector.
* gcc.target/aarch64/sve/acle/general-c/gnu_vectors_2.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 05:04:10 +0000 (21:04 -0800)]
testsuite: Fix aarch64/sve/acle/general-c++/gnu_vectors_[12].C for taking address of vector element
After the recent changes for SVE vectors becoming usable as GNU vector extensions. You can now get
each of the elements like it was an array. There is no reason why taking the address of that
won't be invalid too. especially since we are limiting to the first N elements (where N is the
min arch supported elements for these types).
So this removes the error message on these 2 lines and fixes the testcase.
Pushed as obvious after a quick test for these tests for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_1.C: Remove
error message on taking address of an element of a vector.
* g++.target/aarch64/sve/acle/general-c++/gnu_vectors_2.C: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 04:58:14 +0000 (20:58 -0800)]
testsuite: Fix sve-sizeless-[12].C for C++98
In C++98 `{ a }` for aggregates can only mean constructing by
each element rather than a copy. This adds the expected error
message for SVE vectors for C++98.
Pushed as obvious after a test for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.dg/ext/sve-sizeless-1.C: Add error message for line 164
for C++98 only.
* g++.dg/ext/sve-sizeless-2.C: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 1 Dec 2024 04:40:13 +0000 (20:40 -0800)]
testsuite: Fix another issue with sve-sizeless-[12].C
There is a different error message expected on line 165 (for both files).
It was expecting:
error: cannot convert 'svint16_t' to 'sveint8_t' in initialization
But now we get:
error: cannot convert 'svint16_t' to 'signed char' in initialization
This is because we support constructing scalable vectors rather than before.
So just update error message.
Pushed as obvious after a quick test for aarch64-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.dg/ext/sve-sizeless-1.C: Update error message for line 165.
* g++.dg/ext/sve-sizeless-2.C: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This patch adds optimization of the following patterns:
(zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
(xor:M (zero_extend:M (subreg:N (X:M)), mask))
... where the mask is GET_MODE_MASK (N).
For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:
* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr112398.c: New test.
* gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su.
* gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka.
Jonathan Wakely [Mon, 25 Nov 2024 13:52:19 +0000 (13:52 +0000)]
libstdc++: Move std::monostate to <utility> for C++26 (P0472R2)
Another C++26 paper just approved in Wrocław. The std::monostate class
is defined in <variant> since C++17, but for C++26 it should also be
available in <utility>.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add bits/monostate.h.
* include/Makefile.in: Regenerate.
* include/std/utility: Include <bits/monostate.h>.
* include/std/variant (monostate, hash<monostate>): Move
definitions to ...
* include/bits/monostate.h: New file.
* testsuite/20_util/headers/utility/synopsis.cc: Add monostate
and hash<monostate> declarations.
* testsuite/20_util/monostate/requirements.cc: New test.
Lewis Hyatt [Mon, 28 Oct 2024 16:52:31 +0000 (12:52 -0400)]
Support for 64-bit location_t: Internal parts
Several of the selftests in diagnostic-show-locus.cc and input.cc are
sensitive to linemap internals. Adjust them here so they will support 64-bit
location_t if configured.
Likewise, handle 64-bit location_t in the support for
-fdump-internal-locations. As was done with the analyzer, convert to
(unsigned long long) explicitly so that 32- and 64-bit can be handled with
the same printf formats.
gcc/ChangeLog:
* diagnostic-show-locus.cc
(test_one_liner_fixit_validation_adhoc_locations): Adapt so it can
effectively test 7-bit ranges instead of 5-bit ranges.
(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
* input.cc (get_end_location): Adjust types to support 64-bit
location_t.
(write_digit_row): Likewise.
(dump_location_range): Likewise.
(dump_location_info): Likewise.
(class line_table_case): Likewise.
(test_accessing_ordinary_linemaps): Replace some hard-coded
constants with the values defined in line-map.h.
(for_each_line_table_case): Likewise.
Lewis Hyatt [Tue, 26 Nov 2024 16:53:36 +0000 (11:53 -0500)]
Support for 64-bit location_t: toplev parts
With the upcoming move from 32-bit to 64-bit location_t, the recommended
number of range bits will change from 5 to 7. line-map.h now exports the
recommended setting, so use that instead of hard-coding 5.
gcc/ChangeLog:
* toplev.cc (general_init): Replace hard-coded constant with
line_map_suggested_range_bits.