git.ipfire.org Git - thirdparty/gcc.git/log

]> git.ipfire.org Git - thirdparty/gcc.git/log

projects / thirdparty / gcc.git / log

commit | commitdiff | tree

GCC Administrator [Fri, 18 Aug 2023 00:16:52 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Jonathan Wakely [Thu, 17 Aug 2023 23:26:49 +0000 (00:26 +0100)]

Revert "libstdc++: Reuse double overload of __convert_to_v if possible"

This reverts commit aad83d61d2e92b168688f7b6bd00b8604d11fc9f.

libstdc++-v3/ChangeLog:

* config/locale/generic/c_locale.cc:

commit | commitdiff | tree

Jonathan Wakely [Thu, 17 Aug 2023 19:39:02 +0000 (20:39 +0100)]

libstdc++: Replace global std::string objects in tzdb.cc

When the library is built with --disable-libstdcxx-dual-abi the only
type of std::string supported is the COW string, and the two global
std::string objects in tzdb.cc have to allocate memory. I added them
thinking they would fit in the SSO string buffer, but that's not the
case when the library only uses COW strings.

Replace them with string_view objects to avoid any allocations.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (tzdata_file, leaps_file): Change type to
std::string_view.

commit | commitdiff | tree

Jonathan Wakely [Mon, 24 Jul 2023 10:38:32 +0000 (11:38 +0100)]

libstdc++: Reuse double overload of __convert_to_v if possible

For targets where double and long double have the same representation we
can reuse the same __convert_to_v code for both types. This will
slightly reduce the size of the compiled code in the library.

libstdc++-v3/ChangeLog:

* config/locale/generic/c_locale.cc (__convert_to_v): Reuse
double overload for long double if possible.

commit | commitdiff | tree

Jonathan Wakely [Wed, 19 Jul 2023 08:56:58 +0000 (09:56 +0100)]

libstdc++: Micro-optimize construction of named std::locale

This shaves about 100ns off the std::locale constructor for named
locales (which is only about 1% of the total time).

Using !*s instead of !strcmp(s, "") doesn't make any difference as GCC
optimizes that already even at -O1. !strcmp(s, "C") is optimized at -O2
so replacing that with s[0] == 'C' && s[1] == '\0' only matters for the
--enable-libstdcxx-debug builds. But !strcmp(s, "POSIX") always makes a
call to strcmp at any optimization level. We make that strcmp call,
maybe several times, for any locale name except for "C" (which will be
matched before we get to the check for "POSIX").

For most targets, locale names begin with a lowercase letter and the
only one that begins with 'P' is "POSIX". Replacing !strcmp(s, "POSIX")
with s[0] == 'P' && !strcmp(s+1, "OSIX") means that we avoid calling
strcmp unless the string really does match "POSIX".

Maybe more importantly, I find is_C_locale(s) easier to read than
strcmp(s, "C") == 0 || strcmp(s, "POSIX") == 0, and !is_C_locale(s)
easier to read than strcmp(s, "C") != 0 && strcmp(s, "POSIX") != 0.

libstdc++-v3/ChangeLog:

* src/c++98/localename.cc (is_C_locale): New function.
(locale::locale(const char*)): Use is_C_locale.

commit | commitdiff | tree

Jonathan Wakely [Tue, 8 Aug 2023 15:31:42 +0000 (16:31 +0100)]

libstdc++: Optimize std::string::assign(Iter, Iter) [PR110945]

Calling string::assign(Iter, Iter) with "foreign" iterators (not the
string's own iterator or pointer types) currently constructs a temporary
string and then calls replace to copy the characters from it. That means
we copy from the iterators twice, and if the replace operation has to
grow the string then we also allocate twice.

By using *this = basic_string(first, last, get_allocator()) we only
perform a single allocation+copy and then do a cheap move assignment
instead of a second copy (and possible allocation). But that alternative
has to be done conditionally, so that we don't pessimize the native
iterator case (the string's own iterator and pointer types) which
currently select efficient overloads of replace which will not allocate
at all if the string already has sufficient capacity. For C++20 we can
extend that efficient case to work for any contiguous iterator with the
right value type, not just for the string's native iterators.

So the change is to inline the code that decides whether to work in
place or to allocate+copy (instead of deciding that via overload
resolution for replace), and for the allocate+copy case do a move
assignment instead of another call to replace.

For C++98 there is no change, as we can't do an efficient move
assignment anyway, so keep the current code.

We can also simplify assign(initializer_list<CharT>) because the backing
array for an initializer_list is always disjunct with *this, so most of
the code in _M_replace is not needed.

libstdc++-v3/ChangeLog:

PR libstdc++/110945
* include/bits/basic_string.h (basic_string::assign(Iter, Iter)):
Dispatch to _M_replace or move assignment from a temporary,
based on the iterator type.

commit | commitdiff | tree

Jonathan Wakely [Mon, 7 Aug 2023 13:06:59 +0000 (14:06 +0100)]

libstdc++: Add std::formatter specializations for extended float types

This makes it possible to format _Float32, _Float64 etc. in C++20 mode.
Previously it was only possible to format them in C++23 when the
<stdfloat> typedefs and the std::to_chars overloads were defined.

Instead of relying on std::to_chars for those types, we can just reuse
the formatters for float, double and long double. This also avoids
template bloat by reusing the same specializations instead of
instantiating __formatter_fp for every different type.

libstdc++-v3/ChangeLog:

* include/std/format (formatter): Add partial specializations
for extended floating-point types.
* testsuite/std/format/functions/format.cc: Move test_float128()
to ...
* testsuite/std/format/formatter/ext_float.cc: New test.

commit | commitdiff | tree

Jonathan Wakely [Mon, 7 Aug 2023 11:52:57 +0000 (12:52 +0100)]

libstdc++: Define std::numeric_limits<_FloatNN> before C++23

The extended floating-point types such as _Float32 are supported by GCC
prior to C++23, you just can't use the standard-conforming names from
<stdfloat> to refer to them. This change defines the specializations of
std::numeric_limits for those types for older dialects, not only for
C++23.

libstdc++-v3/ChangeLog:

* include/bits/c++config (__gnu_cxx::__bfloat16_t): Define
whenever __BFLT16_DIG__ is defined, not only for C++23.
* include/std/limits (numeric_limits<bfloat16_t>): Likewise.
(numeric_limits<_Float16>, numeric_limits<_Float32>)
(numeric_limits<_Float64>): Likewise for other extended
floating-point types.

commit | commitdiff | tree

Jonathan Wakely [Tue, 15 Aug 2023 10:54:25 +0000 (11:54 +0100)]

libstdc++: Fix -Wunused-parameter in <experimental/internet>

libstdc++-v3/ChangeLog:

* include/experimental/internet (address_v4::to_string): Remove
unused parameter name.

commit | commitdiff | tree

Jonathan Wakely [Thu, 17 Aug 2023 17:27:15 +0000 (18:27 +0100)]

libstdc++: Make __cmp_cat::__unseq constructor consteval

This constructor should only ever be used with a literal 0 as the
argument, so we can make it consteval. This has the nice advantage that
it is expanded immediately in the front end, and so GDB will never step
into the __cmp_cat::__unseq::__unseq(__unseq*) constructor that is
uninteresting and probably confusing to users.

libstdc++-v3/ChangeLog:

* libsupc++/compare (__cmp_cat::__unseq): Make ctor consteval.
* testsuite/18_support/comparisons/categories/zero_neg.cc: Prune
excess errors caused by invalid consteval calls.

commit | commitdiff | tree

Jonathan Wakely [Tue, 15 Aug 2023 15:35:22 +0000 (16:35 +0100)]

libstdc++: Simplify chrono::__units_suffix using std::format

For std::chrono formatting we can simplify __units_suffix by using
std::format_to to generate the "[n/m]s" suffix with the correct
character type and write directly to the output iterator, so it doesn't
need to be widened using ctype. We can't remove the use of ctype::widen
for formatting a time zone abbreviation as a wide string, because that
can contain arbitrary characters that can't be widened by
__to_wstring_numeric.

This also fixes a bug in the chrono formatter for %Z which created a
dangling wstring_view.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__units_suffix_misc): Remove.
(__units_suffix): Return a known suffix as string view, do not
write unknown suffixes to a buffer.
(__fmt_units_suffix): New function that formats the suffix using
std::format_to.
(operator<<, __chrono_formatter::_M_q): Use __fmt_units_suffix.
(__chrono_formatter::_M_Z): Correct lifetime of wstring.

commit | commitdiff | tree

Jonathan Wakely [Tue, 15 Aug 2023 15:35:22 +0000 (16:35 +0100)]

libstdc++: Rework std::format support for wchar_t

This changes how std::format creates wide strings, by replacing uses of
std::ctype<wchar_t>::widen with the recently-added __to_wstring_numeric
helper function. This removes the dependency on the locale, which should
only be used for locale-specific formats such as {:Ld}.

Also disable all the wide string formatting support if the
_GLIBCXX_USE_WCHAR_T macro is not defined. This is consistent with other
wchar_t support being disabled if the library is built without that
macro defined.

libstdc++-v3/ChangeLog:

* include/std/format [_GLIBCXX_USE_WCHAR_T]: Guard all wide
string formatters with this macro.
(__formatter_int::_M_format_int, __formatter_fp::format)
(formatter<const void*, C>::format): Use __to_wstring_numeric
instead of std::ctype::widen.
(__formatter_fp::_M_localize): Use hardcoded wchar_t values
instead of std::ctype::widen.
* testsuite/std/format/functions/format.cc: Add more checks for
wstring formatting of arithmetic types.

commit | commitdiff | tree

Jonathan Wakely [Wed, 16 Aug 2023 15:55:00 +0000 (16:55 +0100)]

libstdc++: Implement std::to_string in terms of std::format (P2587R3)

This change for C++26 affects std::to_string for floating-point
arguments, so that they should be formatted using std::format("{}", v)
instead of using sprintf. The modified specification in the standard
also affects integral arguments, but there's no observable difference
for them, and we already use std::to_chars for them anyway.

To avoid <string> depending on all of <format>, this change actually
just uses std::to_chars directly instead of using std::format. This is
equivalent, because the format spec "{}" doesn't use any of the other
features of std::format.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (to_string(floating-point-type)):
Implement using std::to_chars for C++26.
* include/bits/version.def (__cpp_lib_to_string): Define.
* include/bits/version.h: Regenerate.
* testsuite/21_strings/basic_string/numeric_conversions/char/dr1261.cc:
Adjust expected result in C++26 mode.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/dr1261.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/to_wstring.cc:
Likewise.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string_float.cc:
New test.
* testsuite/21_strings/basic_string/numeric_conversions/wchar_t/to_wstring_float.cc:
New test.
* testsuite/21_strings/basic_string/numeric_conversions/version.cc:
New test.

commit | commitdiff | tree

Jonathan Wakely [Mon, 14 Aug 2023 10:56:55 +0000 (11:56 +0100)]

libstdc++: Optimize std::to_string using std::string::resize_and_overwrite

This uses std::string::__resize_and_overwrite to avoid initializing the
string buffer with characters that are immediately overwritten. This
results in about 6% better performance for the std_to_string case in
int-benchmark.cc from https://github.com/fmtlib/format-benchmark

This requires a change to a testcase. The previous implementation
guaranteed that the string returned from std::to_string(integral-type)
would have no excess capacity, because it was constructed with the
correct length. The new implementation constructs an empty string and
then resizes it with resize_and_overwrite, which over-allocates. This
means that the "no-excess capacity" guarantee no longer holds.

We can also greatly improve the performance of std::to_wstring by using
std::to_string and then widening it with a new helper function, instead
of using std::swprintf to do the formatting.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (to_string(integral-type)): Use
resize_and_overwrite when available.
(__to_wstring_numeric): New helper functions.
(to_wstring): Use std::to_string then __to_wstring_numeric.
* testsuite/21_strings/basic_string/numeric_conversions/char/to_string_int.cc:
Remove check for no excess capacity.

commit | commitdiff | tree

Jonathan Wakely [Wed, 16 Aug 2023 17:22:38 +0000 (18:22 +0100)]

libstdc++: Define std::string::resize_and_overwrite for C++11 and COW string

There are several places in the library where we can improve performance
using resize_and_overwrite so it's inconvenient only being able to use
it in C++23 mode, and only for cxx11 strings. This adds it for COW
strings, and also adds __resize_and_overwrite as an extension for C++11
mode.

The new __resize_and_overwrite is available for C++11 and later, so
within the library we can use that consistently even in C++23. In order
to avoid making a copy (which might not be possible for non-copyable,
non-movable types) the callable is passed to resize_and_overwrite as an
lvalue reference. Unlike wrapping it in std::ref(op) this ensures that
invoking it as std::move(op)(n, p) will use the correct value category.
It also avoids any overhead that would be added by wrapping it in a
lambda like [&op](auto p, auto n) { return std::move(op)(p, n); }.

Adjust std::format to use the new __resize_and_overwrite, which we can
assume exists because we only use std::basic_string<char> and
std::basic_string<wchar_t>, so no program-defined specializations.

The uses in <experimental/internet> cannot be replaced, because those
are type-dependent on an Allocator template parameter, which could mean
they use program-defined specializations of std::basic_string that don't
have the __resize_and_overwrite extension.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__resize_and_overwrite): New
function.
* include/bits/basic_string.tcc (__resize_and_overwrite): New
function.
(resize_and_overwrite): Simplify by using reserve instead of
growing the string manually. Adjust for C++11 compatibility.
* include/bits/cow_string.h (resize_and_overwrite): New
function.
(__resize_and_overwrite): New function.
* include/bits/version.def (__cpp_lib_string_resize_and_overwrite):
Do not depend on cxx11abi.
* include/bits/version.h: Regenerate.
* include/std/format (__formatter_fp::_S_resize_and_overwrite):
Remove.
(__formatter_fp::format, __formatter_fp::_M_localize): Use
__resize_and_overwrite instead of _S_resize_and_overwrite.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Adjust for C++11 compatibility when included by ...
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite_ext.cc:
New test.

commit | commitdiff | tree

Andrew MacLeod [Thu, 17 Aug 2023 15:13:14 +0000 (11:13 -0400)]

Fix range-ops operator_addr.

Lack of symbolic information prevents op1_range from beig able to draw
the same conclusions as fold_range can.

PR tree-optimization/111009
gcc/
* range-op.cc (operator_addr_expr::op1_range): Be more restrictive.

gcc/testsuite/
* gcc.dg/pr111009.c: New.

commit | commitdiff | tree

Patrick O'Neill [Wed, 16 Aug 2023 18:55:41 +0000 (11:55 -0700)]

RISCV: Add rotate immediate regression test

This adds new regression tests to ensure half-register rotations are
correctly optimized into rori instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-rol-ror-08.c: New test.
* gcc.target/riscv/zbb-rol-ror-09.c: New test.

Co-authored-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

commit | commitdiff | tree

Patrick Palka [Thu, 17 Aug 2023 16:56:32 +0000 (12:56 -0400)]

libstdc++: Implement P2770R0 changes to join_view / join_with_view

This C++23 paper fixes an issue in these views when adapting a certain
kind of non-forward range, and we treat it as a DR against C++20.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/bits/regex.h (regex_iterator::iterator_concept):
Define for C++20 as per P2770R0.
(regex_token_iterator::iterator_concept): Likewise.
* include/std/ranges (__detail::__as_lvalue): Define.
(join_view::_Iterator): Befriend join_view.
(join_view::_Iterator::_M_satisfy): Use _M_get_outer
instead of _M_outer.
(join_view::_Iterator::_M_get_outer): Define.
(join_view::_Iterator::_Iterator): Split constructor taking
_Parent argument into two as per P2770R0.  Remove constraint on
default constructor.
(join_view::_Iterator::_M_outer): Make this data member present
only when the underlying range is forward.
(join_view::_Iterator::operator++): Use _M_get_outer instead of
_M_outer.
(join_view::_Iterator::operator--): Use __as_lvalue helper.
(join_view::_Iterator::operator==): Adjust constraints as per
P2770R0.
(join_view::_Sentinel::__equal): Use _M_get_outer instead of
_M_outer.
(join_view::_M_outer): New data member when the underlying range
is non-forward.
(join_view::begin): Adjust definition as per P2770R0.
(join_view::end): Likewise.
(join_with_view::_M_outer_it): New data member when the
underlying range is non-forward.
(join_with_view::begin): Adjust definition as per P2770R0.
(join_with_view::end): Likewise.
(join_with_view::_Iterator::_M_outer_it): Make this data member
present only when the underlying range is forward.
(join_with_view::_Iterator::_M_get_outer): Define.
(join_with_view::_Iterator::_Iterator): Split constructor
taking _Parent argument into two as per P2770R0.  Remove
constraint on default constructor.
(join_with_view::_Iterator::_M_update_inner): Adjust definition
as per P2770R0.
(join_with_view::_Iterator::_M_get_inner): Likewise.
(join_with_view::_Iterator::_M_satisfy): Adjust calls to
_M_get_inner.  Use _M_get_outer instead of _M_outer_it.
(join_with_view::_Iterator::operator==): Adjust constraints
as per P2770R0.
(join_with_view::_Sentinel::operator==): Use _M_get_outer
instead of _M_outer_it.
* testsuite/std/ranges/adaptors/p2770r0.cc: New test.

commit | commitdiff | tree

Patrick Palka [Thu, 17 Aug 2023 16:40:04 +0000 (12:40 -0400)]

libstdc++: Convert _RangeAdaptorClosure into a CRTP base [PR108827]

Using the CRTP idiom for this base class avoids bloating the size of a
pipeline when adding distinct empty range adaptor closure objects to it,
as detailed in section 4.1 of P2387R3.

But it means we can no longer define its operator| overloads as hidden
friends, since it'd mean each instantiation of _RangeAdaptorClosure
introduces its own distinct set of hidden friends.  So e.g. for the
outer | in

  x | (views::reverse | views::join)

ADL would find 6 distinct hidden operator| friends:

  two from _RangeAdaptorClosure<_Reverse>
  two from _RangeAdaptorClosure<_Join>
  two from _RangeAdaptorClosure<_Pipe<_Reverse, _Join>>

but we really only want to consider the last two.

We avoid this issue by instead defining the operator| overloads at
namespace scope alongside _RangeAdaptorClosure.  This should be fine
because the only types defined in this namespace are _RangeAdaptorClosure,
_RangeAdaptor, _Pipe and _Partial, so we don't have to worry about
unintentional ADL.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/108827

libstdc++-v3/ChangeLog:

* include/std/ranges (__adaptor::_RangeAdaptorClosure):
Convert into a CRTP class template.  Move hidden operator|
friends into namespace scope and adjust their constraints.
(__closure::__is_range_adaptor_closure_fn): Define.
(__closure::__is_range_adaptor_closure): Define.
(__adaptor::_Partial): Adjust use of _RangeAdaptorClosure.
(__adaptor::_Pipe): Likewise.
(views::_All): Likewise.
(views::_Join): Likewise.
(views::_Common): Likewise.
(views::_Reverse): Likewise.
(views::_Elements): Likewise.
(views::_Adjacent): Likewise.
(views::_AsRvalue): Likewise.
(views::_Enumerate): Likewise.
(views::_AsConst): Likewise.
* testsuite/std/ranges/adaptors/all.cc: Reinstate assertion
expecting that adding empty range adaptor closure objects to a
pipeline doesn't increase the size of a pipeline.

commit | commitdiff | tree

Vladimir N. Makarov [Thu, 17 Aug 2023 15:57:45 +0000 (11:57 -0400)]

[LRA]: When assigning stack slots to pseudos previously assigned to fp consider other spilled pseudos

The previous LRA patch can assign slot of conflicting pseudos to
pseudos spilled after prohibiting fp->sp elimination. This patch
fixes this problem.

gcc/ChangeLog:

* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Moving
slots_num initialization from here ...
(lra_spill): ... to here before the 1st call of
assign_stack_slot_num_and_sort_pseudos. Add the 2nd call after
fp->sp elimination.

commit | commitdiff | tree

Jose E. Marchesi [Thu, 17 Aug 2023 13:36:26 +0000 (15:36 +0200)]

Add warning options -W[no-]compare-distinct-pointer-types

GCC emits pedwarns unconditionally when comparing pointers of
different types, for example:

  int xdp_context (struct xdp_md *xdp)
    {
        void *data = (void *)(long)xdp->data;
        __u32 *metadata = (void *)(long)xdp->data_meta;
        __u32 ret;

        if (metadata + 1 > data)
          return 0;
        return 1;
   }

  /home/jemarch/foo.c: In function ‘xdp_context’:
  /home/jemarch/foo.c:15:20: warning: comparison of distinct pointer types lacks a cast
         15 |   if (metadata + 1 > data)
                 |                    ^

LLVM supports an option -W[no-]compare-distinct-pointer-types that can
be used in order to enable or disable the emission of such warnings.
It is enabled by default.

This patch adds the same options to GCC.

Documentation and testsuite updated included.
Regtested in x86_64-linu-gnu.
No regressions observed.

gcc/ChangeLog:

PR c/106537
* doc/invoke.texi (Option Summary): Mention
-Wcompare-distinct-pointer-types under `Warning Options'.
(Warning Options): Document -Wcompare-distinct-pointer-types.

gcc/c-family/ChangeLog:

PR c/106537
* c.opt (Wcompare-distinct-pointer-types): New option.

gcc/c/ChangeLog:

PR c/106537
* c-typeck.cc (build_binary_op): Warning on comparing distinct
pointer types only when -Wcompare-distinct-pointer-types.

gcc/testsuite/ChangeLog:

PR c/106537
* gcc.c-torture/compile/pr106537-1.c: New test.
* gcc.c-torture/compile/pr106537-2.c: Likewise.
* gcc.c-torture/compile/pr106537-3.c: Likewise.

commit | commitdiff | tree

Jan-Benedict Glaw [Thu, 17 Aug 2023 13:54:30 +0000 (15:54 +0200)]

Fix code_helper unused argument warning for fr30

fr30 is the only target defining GO_IF_LEGITIMATE_ADDRESS right now, in
which case the `code_helper ch` argument to memory_address_addr_space_p()
is unused and emits a new warning.

gcc/ChangeLog:
* recog.cc (memory_address_addr_space_p): Mark possibly unused
argument as unused.

commit | commitdiff | tree

Tsukasa OI [Thu, 17 Aug 2023 13:52:14 +0000 (07:52 -0600)]

[PATCH] RISC-V: Deduplicate #error messages in testsuite

"#error Feature macro not defined" is required to test the existence of an
extension through the preprocessor. However, multiple occurrence of the
exact same error message will confuse the developer once an error is
encountered.

This commit replaces such error messages to
"#error Feature macro for `EXT' not defined" to make which
macro is missing.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn.c: Deduplicate #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.

commit | commitdiff | tree

Richard Biener [Thu, 17 Aug 2023 11:10:14 +0000 (13:10 +0200)]

tree-optimization/111039 - abnormals and bit test merging

The following guards the bit test merging code in if-combine against
the appearance of SSA names used in abnormal PHIs.

PR tree-optimization/111039
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Check for
SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

* gcc.dg/pr111039.c: New testcase.

commit | commitdiff | tree

Tobias Burnus [Thu, 17 Aug 2023 13:20:55 +0000 (15:20 +0200)]

libgomp: call numa_available first when using libnuma

The documentation requires that numa_available() is called and only
when successful, other libnuma function may be called. Internally,
it does a syscall to get_mempolicy with flag=0 (which would return
the default policy if mode were not NULL). If this returns -1 (and
not 0) and errno == ENOSYS, the Linux kernel does not have the
get_mempolicy syscall function; if so, numa_available() returns -1
(otherwise: 0).

libgomp/

PR libgomp/111024
* allocator.c (gomp_init_libnuma): Call numa_available; if
not available or not returning 0, disable libnuma usage.

commit | commitdiff | tree

Alex Coplan [Thu, 17 Aug 2023 13:08:31 +0000 (14:08 +0100)]

doc: Fixes to RTL-SSA sample code

This patch fixes up the code examples in the RTL-SSA documentation (the
sections on making insn changes) to reflect the current API.

The main issues are as follows:
- rtl_ssa::recog takes an obstack_watermark & as the first parameter.
   Presumably this is intended to be the change attempt, so I've updated
   the examples to pass this through.
- The variants of recog and restrict_movement that take an ignore
   predicate have been renamed with an _ignoring suffix, so I've
   updated callers to use those names.
- A couple of minor "obvious" fixes to add a missing address-of
   operator and correct a variable name.

gcc/ChangeLog:

* doc/rtl.texi: Fix up sample code for RTL-SSA insn changes.

commit | commitdiff | tree

Lehua Ding [Thu, 17 Aug 2023 11:37:17 +0000 (19:37 +0800)]

RISC-V: Fix XPASS slp testcases

This patch fixs XPASS slp testcases on trunk by
making the conditions for xfail stricter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Fix.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.

commit | commitdiff | tree

Jose E. Marchesi [Thu, 17 Aug 2023 12:19:15 +0000 (14:19 +0200)]

bpf: support `naked' function attributes in BPF targets

The kernel selftests and other BPF programs make extensive use of the
`naked' function attribute with bodies written using basic inline
assembly. This patch adds support for the attribute to
bpf-unkonwn-none, makes it to inhibit warnings due to lack of explicit
`return' statement, and updates documentation and testsuite
accordingly.

Tested in x86_64-linux-gnu host and bpf-unknown-none target.

gcc/ChangeLog

PR target/111046
* config/bpf/bpf.cc (bpf_attribute_table): Add entry for the
`naked' function attribute.
(bpf_warn_func_return): New function.
(TARGET_WARN_FUNC_RETURN): Define.
(bpf_expand_prologue): Add preventive comment.
(bpf_expand_epilogue): Likewise.
* doc/extend.texi (BPF Function Attributes): Document the `naked'
function attribute.

gcc/testsuite/ChangeLog

* gcc.target/bpf/naked-1.c: New test.

commit | commitdiff | tree

Jonathan Wakely [Thu, 17 Aug 2023 12:02:27 +0000 (13:02 +0100)]

libstdc++: Fix std::format("{:F}", inf) to use uppercase

std::format was treating {:f} and {:F} identically on the basis that for
the fixed 1.234567 format there are no alphabetical characters that need
to be in uppercase. But that's wrong for infinities and NaNs, which
should be formatted as "INF" and "NAN" for {:F}.

libstdc++-v3/ChangeLog:

* include/std/format (__format::_Pres_type): Add _Pres_F.
(__formatter_fp::parse): Use _Pres_F for 'F'.
(__formatter_fp::format): Set __upper for _Pres_F.
* testsuite/std/format/functions/format.cc: Check formatting of
infinity and NaN for each presentation type.

commit | commitdiff | tree

Jonathan Wakely [Thu, 17 Aug 2023 10:17:14 +0000 (11:17 +0100)]

libstdc++: Regenerate Makefile.in

libstdc++-v3/ChangeLog:

* include/Makefile.in: Regenerate.

commit | commitdiff | tree

Richard Biener [Tue, 15 Aug 2023 13:17:08 +0000 (15:17 +0200)]

Handle TYPE_OVERFLOW_UNDEFINED vectorized BB reductions

The following changes the gate to perform vectorization of BB reductions
to use needs_fold_left_reduction_p which in turn requires handling
TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by
promoting any operations generated there to use unsigned arithmetic.

The following does this, there's currently only v16qi where x86
supports a .REDUC_PLUS reduction for integral modes so I had to
add a x86 specific testcase using GIMPLE IL.

* tree-vect-slp.cc (vect_slp_check_for_roots): Use
!needs_fold_left_reduction_p to decide whether we can
handle the reduction with association.
(vectorize_slp_instance_root_stmt): For TYPE_OVERFLOW_UNDEFINED
reductions perform all arithmetic in an unsigned type.

* gcc.target/i386/vect-reduc-2.c: New testcase.

commit | commitdiff | tree

benjamin priour [Tue, 15 Aug 2023 18:25:42 +0000 (20:25 +0200)]

testsuite: Remove unused dg-line in ce8cdf5bcf96a2db6d7b9f656fc9ba58d7942a83

Test case g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C
introduced by patch ce8cdf5bcf96a2db6d7b9f656fc9ba58d7942a83
emitted a warning for an unused dg-line variable.
This fixes up the blunder.

Signed-off-by: benjamin priour <vultkayn@gcc.gnu.org>
gcc/testsuite/ChangeLog:

* g++.dg/analyzer/fanalyzer-show-events-in-system-headers.C:
Remove dg-line var declare_a.

commit | commitdiff | tree

Rainer Orth [Thu, 17 Aug 2023 08:16:57 +0000 (10:16 +0200)]

fixincludes: Update darwin_flt_eval_method for macOS 14

On macOS 14, a guard in <math.h> changed:

-- MacOSX13.3.sdk/usr/include/math.h 2023-04-19 01:54:44
+++ MacOSX14.0.sdk/usr/include/math.h 2023-08-01 08:42:43
@@ -22,0 +23 @@
+
@@ -43 +44 @@
-#if __FLT_EVAL_METHOD__ == 0
+#if __FLT_EVAL_METHOD__ == 0 || __FLT_EVAL_METHOD__ == -1
@@ -49 +50 @@
-#elif __FLT_EVAL_METHOD__ == 2 || __FLT_EVAL_METHOD__ == -1
+#elif __FLT_EVAL_METHOD__ == 2

Therefore the darwin_flt_eval_method fixincludes fix doesn't match any
longer, leading to a large number of testsuite failures like

/private/var/gcc/regression/master/14-gcc/build/gcc/include-fixed/math.h:69:5:
error: #error "Unsupported value of __FLT_EVAL_METHOD__."

where __FLT_EVAL_METHOD__ = 16.

This patch adjusts the fix to allow for both forms.

Tested with make check in fixincludes on x86_64-apple-darwin23.0.0 and
verifying that <math.h> has indeed been fixed as expected.

2023-08-16 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

fixincludes:
* inclhack.def (darwin_flt_eval_method): Handle macOS 14 guard
variant.
* fixincl.x: Regenerate.
* tests/base/math.h [DARWIN_FLT_EVAL_METHOD_CHECK]: Update test.

commit | commitdiff | tree

Rainer Orth [Thu, 17 Aug 2023 08:14:49 +0000 (10:14 +0200)]

build: Allow for Xcode 15 ld -v output

Since Xcode 15 beta 6, ld -v output differs from previous versions:

* macOS 13/Xcode 14:

  @(#)PROGRAM:ld  PROJECT:ld64-857.1

* macOS 14/Xcode 15:

  @(#)PROGRAM:ld  PROJECT:dyld-1015.1

configure cannot handle the new form, so LD64_VERSION isn't set.

This patch fixes this.  The autoconf manual states that sed doesn't
portably support alternation, so I'm using two separate expressions to
extract the version number.

Tested on x86_64-apple-darwin23.0.0.

2023-08-16  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* configure.ac (gcc_cv_ld64_version): Allow for dyld in ld -v
output.
* configure: Regenerate.

commit | commitdiff | tree

Jonathan Wakely [Wed, 16 Aug 2023 20:29:46 +0000 (21:29 +0100)]

libstdc++: Disable PCH for tests that rely on include order

These tests expect to be able to #undef a feature test macro and then
include <version> to get it redefined. But if <version> has already been
included by the <bits/stdc++.h> PCH then including it again does nothing
and the macro remains undefined.

libstdc++-v3/ChangeLog:

* testsuite/24_iterators/move_iterator/p2520r0.cc: Add no_pch.
* testsuite/std/format/functions/format.cc: Likewise.
* testsuite/std/format/functions/format_c++23.cc: Likewise.

commit | commitdiff | tree

Jonathan Wakely [Wed, 16 Aug 2023 20:46:05 +0000 (21:46 +0100)]

libstdc++: Fix testsuite no_pch directive

The { dg-add-options no_pch } directive is supposed to add a macro
definition that invalidates the PCH file, and ensures that the #include
directives in the test file are processed as written. But the proc that
adds the options actually removes all existing options, cancelling out
any previous dg-options directive.

This means that using no_pch will cause FAILs in a file that relies on
other options set by an earlier dg-options.

The no_pch directive was added for PR libstdc++/21769 where Janis
suggested adding it as return "$flags -D__GLIBCXX__=99999999" but what
was actually committed didn't include the $flags so replaced them.

Additionally, using no_pch only prevents the precompiled version of
<bits/stdc++.h> from being included, it doesn't prevent the
non-precompiled version being included by -include bits/stdc++.h in the
test flags. Use regsub to filter that out of the options as well.

libstdc++-v3/ChangeLog:

* testsuite/lib/dg-options.exp (add_options_for_no_pch): Remove
any "-include bits/stdc++.h" from options and add the macro to
the existing options instead of replacing them.

commit | commitdiff | tree

Pan Li [Thu, 17 Aug 2023 07:21:42 +0000 (15:21 +0800)]

RISC-V: Support RVV VFWREDOSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWREDOSUM.VS as the below samples

* __riscv_vfwredosum_vs_f32m1_f64m1_rm
* __riscv_vfwredosum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(widen_freducop): Add frm_opt_type template arg.
(vfwredosum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredosum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredosum.c: New test.

commit | commitdiff | tree

Pan Li [Thu, 17 Aug 2023 06:09:18 +0000 (14:09 +0800)]

RISC-V: Support RVV VFREDOSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFREDOSUM.VS as the below samples.

* __riscv_vfredosum_vs_f32m1_f32m1_rm
* __riscv_vfredosum_vs_f32m1_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfredosum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfredosum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-redosum.c: New test.

commit | commitdiff | tree

Pan Li [Thu, 17 Aug 2023 03:03:39 +0000 (11:03 +0800)]

RISC-V: Support RVV VFREDUSUM.VS rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFREDUSUM.VS as the below samples.

* __riscv_vfredusum_vs_f32m1_f32m1_rm
* __riscv_vfredusum_vs_f32m1_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class freducop): Add frm_op_type template arg.
(vfredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfredusum_frm): New intrinsic function def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct reduc_alu_frm_def): New class for frm shape.
(SHAPE): New declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-redusum.c: New test.

commit | commitdiff | tree

Pan Li [Thu, 17 Aug 2023 02:04:51 +0000 (10:04 +0800)]

RISC-V: Support RVV VFNCVT.F.{X|XU|F}.W rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNCVT.F.{X|XU|F}.W as the below samples.

* __riscv_vfncvt_f_x_w_f32m1_rm
* __riscv_vfncvt_f_x_w_f32m1_rm_m
* __riscv_vfncvt_f_xu_w_f32m1_rm
* __riscv_vfncvt_f_xu_w_f32m1_rm_m
* __riscv_vfncvt_f_f_w_f32m1_rm
* __riscv_vfncvt_f_f_w_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfncvt_f): Add frm_op_type template arg.
(vfncvt_f_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_f_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-ncvt-f.c: New test.

commit | commitdiff | tree

Pan Li [Thu, 17 Aug 2023 01:17:08 +0000 (09:17 +0800)]

RISC-V: Support RVV VFNCVT.XU.F.W rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNCVT.XU.F.W as the below samples.

* __riscv_vfncvt_xu_f_w_u16mf2_rm
* __riscv_vfncvt_xu_f_w_u16mf2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfncvt_xu_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_xu_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-ncvt-xu.c: New test.

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 12:47:38 +0000 (20:47 +0800)]

RISC-V: Support RVV VFNCVT.X.F.W rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFNCVT.X.F.W as the below samples.

* __riscv_vfncvt_x_f_w_i16mf2_rm
* __riscv_vfncvt_x_f_w_i16mf2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfncvt_x): Add frm_op_type template arg.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfncvt_x_frm): New intrinsic function def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct narrow_alu_frm_def): New shape function for frm.
(SHAPE): New declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-ncvt-x.c: New test.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:25:53 +0000 (14:25 +0800)]

[Patch 6/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-vextractf64x2-1.c: New test.
* gcc.target/i386/avx10_1-vextracti64x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vfpclasspd-1.c: Ditto.
* gcc.target/i386/avx10_1-vfpclassps-1.c: Ditto.
* gcc.target/i386/avx10_1-vinsertf64x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vinserti64x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vrangepd-1.c: Ditto.
* gcc.target/i386/avx10_1-vrangeps-1.c: Ditto.
* gcc.target/i386/avx10_1-vreducepd-1.c: Ditto.
* gcc.target/i386/avx10_1-vreduceps-1.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:24:59 +0000 (14:24 +0800)]

[Patch 5/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins

gcc/ChangeLog:

* config/i386/avx512vldqintrin.h: Remove target attribute.
* config/i386/i386-builtin.def (BDESC):
Add OPTION_MASK_ISA2_AVX10_1.
* config/i386/sse.md (VF_AVX512VLDQ_AVX10_1): New.
(VFH_AVX512VLDQ_AVX10_1): Ditto.
(VF1_AVX512VLDQ_AVX10_1): Ditto.
(<mask_codefor>reducep<mode><mask_name><round_saeonly_name>):
Change iterator to VFH_AVX512VLDQ_AVX10_1. Remove target check.
(vec_pack<floatprefix>_float_<mode>): Change iterator to
VI8_AVX512VLDQ_AVX10_1. Remove target check.
(vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Change iterator to
VF1_AVX512VLDQ_AVX10_1. Remove target check.
(vec_unpack_<fixprefix>fix_trunc_hi_<mode>): Ditto.
(VI48F_256_DQVL_AVX10_1): Rename from VI48F_256_DQ.
(avx512vl_vextractf128<mode>): Change iterator to
VI48F_256_DQVL_AVX10_1. Remove target check.
(vec_extract_hi_<mode>_mask): Add TARGET_AVX10_1.
(vec_extract_hi_<mode>): Ditto.
(avx512vl_vinsert<mode>): Ditto.
(vec_set_lo_<mode><mask_name>): Ditto.
(vec_set_hi_<mode><mask_name>): Ditto.
(avx512dq_rangep<mode><mask_name><round_saeonly_name>): Change
iterator to VF_AVX512VLDQ_AVX10_1. Remove target check.
(avx512dq_fpclass<mode><mask_scalar_merge_name>): Change
iterator to VFH_AVX512VLDQ_AVX10_1. Remove target check.
* config/i386/subst.md (mask_avx512dq_condition): Add
TARGET_AVX10_1.
(mask_scalar_merge): Ditto.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:24:12 +0000 (14:24 +0800)]

[Patch 4/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-abs-copysign-1.c: New test.
* gcc.target/i386/avx10_1-vandpd-1.c: Ditto.
* gcc.target/i386/avx10_1-vandps-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtps2qq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtps2uqq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtqq2pd-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtqq2ps-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtuqq2pd-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtuqq2ps-1.c: Ditto.
* gcc.target/i386/avx10_1-vorpd-1.c: Ditto.
* gcc.target/i386/avx10_1-vorps-1.c: Ditto.
* gcc.target/i386/avx10_1-vpmovd2m-1.c: Ditto.
* gcc.target/i386/avx10_1-vpmovm2d-1.c: Ditto.
* gcc.target/i386/avx10_1-vpmovm2q-1.c: Ditto.
* gcc.target/i386/avx10_1-vpmovq2m-1.c: Ditto.
* gcc.target/i386/avx10_1-vxorpd-1.c: Ditto.
* gcc.target/i386/avx10_1-vxorps-1.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:23:30 +0000 (14:23 +0800)]

[Patch 3/6] Support AVX10.1 for AVX512DQ+AVX512VL intrins

gcc/ChangeLog:

* config/i386/avx512vldqintrin.h: Remove target attribute.
* config/i386/i386-builtin.def (BDESC):
Add OPTION_MASK_ISA2_AVX10_1.
* config/i386/i386.cc (standard_sse_constant_opcode): Add TARGET_AVX10_1.
* config/i386/sse.md: (VI48_AVX512VL_AVX10_1): New.
(VI48_AVX512VLDQ_AVX10_1): Ditto.
(VF2_AVX512VL): Remove.
(VI8_256_512VLDQ_AVX10_1): Rename from VI8_256_512.
Add TARGET_AVX10_1.
(*<code><mode>3<mask_name>): Change isa attribute to
avx10_1_or_avx512dq. Add TARGET_AVX10_1.
(<code><mode>3): Add TARGET_AVX10_1. Change isa attr
to avx10_1_or_avx512vl.
(<mask_codefor>avx512dq_cvtps2qq<mode><mask_name><round_name>):
Change iterator to VI8_256_512VLDQ_AVX10_1. Remove target check.
(<mask_codefor>avx512dq_cvtps2qqv2di<mask_name>):
Add TARGET_AVX10_1.
(<mask_codefor>avx512dq_cvtps2uqq<mode><mask_name><round_name>):
Change iterator to VI8_256_512VLDQ_AVX10_1. Remove target check.
(<mask_codefor>avx512dq_cvtps2uqqv2di<mask_name>):
Add TARGET_AVX10_1.
(float<floatunssuffix><sseintvecmodelower><mode>2<mask_name><round_name>):
Change iterator to VF2_AVX512VLDQ_AVX10_1. Remove target check.
(float<floatunssuffix><sselongvecmodelower><mode>2<mask_name><round_name>):
Change iterator to VF1_128_256VLDQ_AVX10_1. Remove target check.
(float<floatunssuffix>v4div4sf2<mask_name>):
Add TARGET_AVX10_1.
(avx512dq_float<floatunssuffix>v2div2sf2): Ditto.
(*avx512dq_float<floatunssuffix>v2div2sf2): Ditto.
(float<floatunssuffix>v2div2sf2): Ditto.
(float<floatunssuffix>v2div2sf2_mask): Ditto.
(*float<floatunssuffix>v2div2sf2_mask): Ditto.
(*float<floatunssuffix>v2div2sf2_mask_1): Ditto.
(<avx512>_cvt<ssemodesuffix>2mask<mode>):
Change iterator to VI48_AVX512VLDQ_AVX10_1. Remove target check.
(<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>):
Change iterator to VI48_AVX512VL_AVX10_1. Remove target check.
Change when constraint is enabled.

commit | commitdiff | tree

Juzhe-Zhong [Thu, 17 Aug 2023 05:59:06 +0000 (13:59 +0800)]

RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

void foo(_Float16 y, int64_t *i64p)
{
  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}

zve64f:
foo:
vsetivli zero,1,e16,mf4,ta,ma
vle64.v v1,0(a0)
vfmv.s.f v2,fa0
vsetvli zero,zero,e64,m1,ta,ma
vadd.vv v1,v1,v1

zve64d:
foo:
vsetivli zero,1,e64,m1,ta,ma
vle64.v v1,0(a0)
vfmv.s.f v2,fa0
vadd.vv v1,v1,v1

gcc/ChangeLog:

PR target/111037
* config/riscv/riscv-vsetvl.cc (float_insn_valid_sew_p): New function.
(second_sew_less_than_first_sew_p): Fix bug.
(first_sew_less_than_second_sew_p): Ditto.

gcc/testsuite/ChangeLog:

PR target/111037
* gcc.target/riscv/rvv/base/pr111037-1.c: New test.
* gcc.target/riscv/rvv/base/pr111037-2.c: New test.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:19:59 +0000 (14:19 +0800)]

Support AVX10.1 for AVX512DQ+AVX512VL intrins

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-vandnpd-1.c: New test.
* gcc.target/i386/avx10_1-vandnps-1.c: Ditto.
* gcc.target/i386/avx10_1-vbroadcastf32x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vbroadcastf64x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vbroadcasti32x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vbroadcasti64x2-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtpd2qq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvtpd2uqq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvttpd2qq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvttpd2uqq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvttps2qq-1.c: Ditto.
* gcc.target/i386/avx10_1-vcvttps2uqq-1.c: Ditto.
* gcc.target/i386/avx10_1-vpmullq-1.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:19:05 +0000 (14:19 +0800)]

Support AVX10.1 for AVX512DQ+AVX512VL intrins

gcc/ChangeLog:

* config/i386/avx512vldqintrin.h: Remove target attribute.
* config/i386/i386-builtin.def (BDESC):
Add OPTION_MASK_ISA2_AVX10_1.
* config/i386/i386-builtins.cc (def_builtin): Handle AVX10_1.
* config/i386/i386-expand.cc
(ix86_check_builtin_isa_match): Ditto.
(ix86_expand_sse2_mulvxdi3): Add TARGET_AVX10_1.
* config/i386/i386.md: Add new isa attribute avx10_1_or_avx512dq
and avx10_1_or_avx512vl.
* config/i386/sse.md: (VF2_AVX512VLDQ_AVX10_1): New.
(VF1_128_256VLDQ_AVX10_1): Ditto.
(VI8_AVX512VLDQ_AVX10_1): Ditto.
(<sse>_andnot<mode>3<mask_name>):
Add TARGET_AVX10_1 and change isa attr from avx512dq to
avx10_1_or_avx512dq.
(*andnot<mode>3): Add TARGET_AVX10_1 and change isa attr from
avx512vl to avx10_1_or_avx512vl.
(fix<fixunssuffix>_trunc<mode><sseintvecmodelower>2<mask_name><round_saeonly_name>):
Change iterator to VF2_AVX512VLDQ_AVX10_1. Remove target check.
(fix_notrunc<mode><sseintvecmodelower>2<mask_name><round_name>):
Ditto.
(ufix_notrunc<mode><sseintvecmodelower>2<mask_name><round_name>):
Ditto.
(fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name><round_saeonly_name>):
Change iterator to VF1_128_256VLDQ_AVX10_1. Remove target check.
(avx512dq_fix<fixunssuffix>_truncv2sfv2di2<mask_name>):
Add TARGET_AVX10_1.
(fix<fixunssuffix>_truncv2sfv2di2): Ditto.
(cond_mul<mode>): Change iterator to VI8_AVX10_1_AVX512DQVL.
Remove target check.
(avx512dq_mul<mode>3<mask_name>): Ditto.
(*avx512dq_mul<mode>3<mask_name>): Ditto.
(VI4F_BRCST32x2): Add TARGET_AVX512DQ and TARGET_AVX10_1.
(<mask_codefor>avx512dq_broadcast<mode><mask_name>):
Remove target check.
(VI8F_BRCST64x2): Add TARGET_AVX512DQ and TARGET_AVX10_1.
(<mask_codefor>avx512dq_broadcast<mode><mask_name>_1):
Remove target check.
* config/i386/subst.md (mask_mode512bit_condition): Add TARGET_AVX10_1.
(mask_avx512vl_condition): Ditto.
(mask): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx-1.c: Add -mavx10.1.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-26.c: Skip AVX512VLDQ intrin file.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:15:43 +0000 (14:15 +0800)]

Emit a warning when AVX10 options conflict in vector width

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(ix86_check_avx10_vector_width): New function to check isa_flags
to emit a warning when there is a conflict in AVX10 options for
vector width.
(ix86_handle_option): Add check for avx10.1-256 and avx10.1-512.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append -mno-avx10-max-512bit for -march=native.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-15.c: New test.
* gcc.target/i386/avx10_1-16.c: Ditto.
* gcc.target/i386/avx10_1-17.c: Ditto.
* gcc.target/i386/avx10_1-18.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Thu, 17 Aug 2023 06:13:28 +0000 (14:13 +0800)]

Emit a warning when disabling AVX512 with AVX10 enabled or disabling AVX10 with AVX512 enabled

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(ix86_check_avx10): New function to check isa_flags and
isa_flags_explicit to emit warning when AVX10 is enabled
by "-m" option.
(ix86_check_avx512): New function to check isa_flags and
isa_flags_explicit to emit warning when AVX512 is enabled
by "-m" option.
(ix86_handle_option): Do not change the flags when warning
is emitted.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Do not append -mno-avx10.1 for -march=native.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-11.c: New test.
* gcc.target/i386/avx10_1-12.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-14.c: Ditto.

commit | commitdiff | tree

Haochen Jiang [Tue, 30 Aug 2022 06:42:30 +0000 (14:42 +0800)]

Initial support for AVX10.1

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features):
Add avx10_set and version and detect avx10.1.
(cpu_indicator_init): Handle avx10.1-512.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_512BIT_SET): New.
(OPTION_MASK_ISA2_AVX10_1_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_512BIT_UNSET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
(OPTION_MASK_ISA2_AVX2_UNSET): Modify for AVX10_1.
(ix86_handle_option): Handle -mavx10.1, -mavx10.1-256 and
-mavx10.1-512.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX10_512BIT, FEATURE_AVX10_1 and
FEATURE_AVX10_512BIT.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for
AVX10_512BIT, AVX10_1 and AVX10_1_512.
* config/i386/constraints.md (Yk): Add AVX10_1.
(Yv): Ditto.
(k): Ditto.
* config/i386/cpuid.h (bit_AVX10): New.
(bit_AVX10_256): Ditto.
(bit_AVX10_512): Ditto.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Define AVX10_512BIT and AVX10_1.
* config/i386/i386-isa.def
(AVX10_512BIT): Add DEF_PTA(AVX10_512BIT).
(AVX10_1): Add DEF_PTA(AVX10_1).
* config/i386/i386-options.cc (isa2_opts): Add -mavx10.1.
(ix86_valid_target_attribute_inner_p): Handle avx10-512bit, avx10.1
and avx10.1-512.
(ix86_option_override_internal): Enable AVX512{F,VL,BW,DQ,CD,BF16,
FP16,VBMI,VBMI2,VNNI,IFMA,BITALG,VPOPCNTDQ} features for avx10.1-512.
(ix86_valid_target_attribute_inner_p): Handle AVX10_1.
* config/i386/i386.cc (ix86_get_ssemov): Add AVX10_1.
(ix86_conditional_register_usage): Ditto.
(ix86_hard_regno_mode_ok): Ditto.
(ix86_rtx_costs): Ditto.
* config/i386/i386.h (VALID_MASK_AVX10_MODE): New macro.
* config/i386/i386.opt: Add option -mavx10.1, -mavx10.1-256 and
-mavx10.1-512.
* doc/extend.texi: Document avx10.1, avx10.1-256 and avx10.1-512.
* doc/invoke.texi: Document -mavx10.1, -mavx10.1-256 and -mavx10.1-512.
* doc/sourcebuild.texi: Document target avx10.1, avx10.1-256
and avx10.1-512.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv33.C: New test.
* gcc.target/i386/avx10_1-1.c: Ditto.
* gcc.target/i386/avx10_1-2.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-4.c: Ditto.
* gcc.target/i386/avx10_1-5.c: Ditto.
* gcc.target/i386/avx10_1-6.c: Ditto.
* gcc.target/i386/avx10_1-7.c: Ditto.
* gcc.target/i386/avx10_1-8.c: Ditto.
* gcc.target/i386/avx10_1-9.c: Ditto.
* gcc.target/i386/avx10_1-10.c: Ditto.

commit | commitdiff | tree

Sergei Trofimovich [Wed, 16 Aug 2023 20:20:08 +0000 (21:20 +0100)]

Drop unused enum vrp_mode.

Follow removal of EVRP and clean up unused defines.

gcc/
* flag-types.h (vrp_mode): Remove unused.

commit | commitdiff | tree

Yanzhang Wang [Thu, 17 Aug 2023 04:28:50 +0000 (22:28 -0600)]

[PATCH] RISC-V: Support simplify (-1-x) for vector.

From: Yanzhang Wang <yanzhang.wang@intel.com>

The pattern is enabled for scalar but not for vector. The patch try to
make it consistent and will convert below code,

shortcut_for_riscv_vrsub_case_1_32:
        vl1re32.v       v1,0(a1)
        vsetvli zero,a2,e32,m1,ta,ma
        vrsub.vi        v1,v1,-1
        vs1r.v  v1,0(a0)
        ret

to,

shortcut_for_riscv_vrsub_case_1_32:
        vl1re32.v       v1,0(a1)
        vsetvli zero,a2,e32,m1,ta,ma
        vnot.v  v1,v1
        vs1r.v  v1,0(a0)
        ret

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Use
CONSTM1_RTX.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/simplify-vrsub.c: New test.

commit | commitdiff | tree

Andrew Pinski [Sat, 12 Aug 2023 01:19:01 +0000 (18:19 -0700)]

Add support for vector conitional not

Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71),
this just adds conditional not too.
Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional
not.

OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

gcc/ChangeLog:

* internal-fn.def (COND_NOT): New internal function.
* match.pd (UNCOND_UNARY, COND_UNARY): Add bit_not/not
to the lists.
(`vec (a ? -1 : 0) ^ b`): New pattern to convert
into conditional not.
* optabs.def (cond_one_cmpl): New optab.
(cond_len_one_cmpl): Likewise.

gcc/testsuite/ChangeLog:

PR target/110986
* gcc.target/aarch64/sve/cond_unary_9.c: New test.

commit | commitdiff | tree

GCC Administrator [Thu, 17 Aug 2023 00:17:21 +0000 (00:17 +0000)]

Daily bump.

commit | commitdiff | tree

Andrew Pinski [Wed, 16 Aug 2023 22:29:43 +0000 (15:29 -0700)]

Add libstdc++-v3/include/bits/version.h to gcc_update touch part

This adds libstdc++-v3/include/bits/version.h so it has the correct timestamp.

Committed as obvious after running contrib/gcc_update --touch

contrib/ChangeLog:

* gcc_update: Add libstdc++-v3/include/bits/version.h.

commit | commitdiff | tree

Harald Anlauf [Wed, 16 Aug 2023 20:00:49 +0000 (22:00 +0200)]

Fortran: fix memleak for character,value dummy of bind(c) procedure [PR110360]

Testcase gfortran.dg/bind_c_usage_13.f03 exhibited a memleak in the frontend
occuring when passing a character literal to a character,value dummy of a
bind(c) procedure, due to a missing cleanup in the conversion of the actual
argument expression.  Reduced testcase:

  program p
    interface
       subroutine val_c (c) bind(c)
         use iso_c_binding, only: c_char
         character(len=1,kind=c_char), value :: c
       end subroutine val_c
    end interface
    call val_c ("A")
  end

gcc/fortran/ChangeLog:

PR fortran/110360
* trans-expr.cc (conv_scalar_char_value): Use gfc_replace_expr to
avoid leaking replaced gfc_expr.

commit | commitdiff | tree

Jonathan Wakely [Tue, 15 Aug 2023 12:48:23 +0000 (13:48 +0100)]

libstdc++: Fix std::basic_string::resize_and_overwrite

The callable used for resize_and_overwrite was being passed the string's
expanded capacity, which might be greater than the new size being
requested. This is not conforming, as the standard requires the same n
to be passed to the callable that the user passed to
resize_and_overwrite.

The existing tests didn't catch this because they all used a value which
was more than twice the existing capacity, so the _M_create call
allocated exactly what was requested, and the value passed to the
callable was correct. But when the requested size is greater than the
current capacity but smaller than twice the current capacity, _M_create
will allocate twice the current capacity and then that value was being
passed to the callable.

I noticed this because std::format(L"{}", 0.25) was producing L"0.25XX"
where the XX characters were whatever happened to be on the stack before
the call. When std::format used resize_and_overwrite to widen a string
it was copying too many characters into the destination and setting the
result's length too long. I've added a test for this case, and a new
test that doesn't hardcode -std=gnu++20 so can be used to test
std::format in C++23 and C++26 modes.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.tcc (resize_and_overwrite): Invoke
the callable with the same size as resize_and_overwrite was
called with.
* testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
Check with small values for the new size.
* testsuite/std/format/functions/format.cc: Check wide
formatting of double values that produce small strings.
* testsuite/std/format/functions/format_c++23.cc: New test.

commit | commitdiff | tree

Jonathan Wakely [Wed, 16 Aug 2023 16:36:45 +0000 (17:36 +0100)]

libstdc++: Update __cplusplus value for C++23 in version.def

libstdc++-v3/ChangeLog:

* include/bits/version.def (stds): Update value for C++23.
* include/bits/version.h: Regenerate.

commit | commitdiff | tree

Surya Kumari Jangala [Mon, 14 Aug 2023 14:34:56 +0000 (09:34 -0500)]

ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

2023-07-21 Surya Kumari Jangala <jskumari@linux.ibm.com>

gcc/
PR rtl-optimization/110254
* ira-color.cc (improve_allocation): Update array
allocated_hard_reg_p.

commit | commitdiff | tree

Jonathan Wakely [Wed, 16 Aug 2023 11:24:35 +0000 (12:24 +0100)]

libstdc++: Fix comment naming upstream PSTL test file

These tests were derived from set.pass.cpp not set.pass.cc, specifically
pstl/test/std/algorithms/alg.sorting/alg.set.operations/set.pass.cpp in
the LLVM repo.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/pstl/alg_sorting/set_difference.cc:
Fix name of upstream file this was derived from.
* testsuite/25_algorithms/pstl/alg_sorting/set_intersection.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_symmetric_difference.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_union.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h: Likewise.

commit | commitdiff | tree

Vladimir N. Makarov [Wed, 16 Aug 2023 13:13:54 +0000 (09:13 -0400)]

[LRA]: Spill pseudos assigned to fp when fp->sp elimination became impossible

Porting LRA to AVR revealed that creating a stack slot can make fp->sp
elimination impossible.  The previous patches undoes fp assignment after
the stack slot creation but calculated wrongly live info after this.  This
resulted in wrong generation by deleting some still alive insns.  This
patch fixes this problem.

gcc/ChangeLog:

* lra-int.h (lra_update_fp2sp_elimination): Change the prototype.
* lra-eliminations.cc (spill_pseudos): Record spilled pseudos.
(lra_update_fp2sp_elimination): Ditto.
(update_reg_eliminate): Adjust spill_pseudos call.
* lra-spills.cc (lra_spill): Assign stack slots to pseudos spilled
in lra_update_fp2sp_elimination.

commit | commitdiff | tree

Arsen Arsenović [Thu, 27 Apr 2023 19:03:15 +0000 (21:03 +0200)]

libstdc++: Replace all manual FTM definitions and use

libstdc++-v3/ChangeLog:

* libsupc++/typeinfo: Switch to bits/version.h for
__cpp_lib_constexpr_typeinfo.
* libsupc++/new: Switch to bits/version.h for
__cpp_lib_{launder,hardware_interference_size,destroying_delete}.
(launder): Guard behind __cpp_lib_launder.
(hardware_destructive_interference_size)
(hardware_constructive_interference_size): Guard behind
__cpp_lib_hardware_interference_size.
* libsupc++/exception: Switch to bits/version.h for
__cpp_lib_uncaught_exceptions.
(uncaught_exceptions): Guard behind __cpp_lib_uncaught_exceptions.
* libsupc++/compare: Switch to bits/version.h for
__cpp_lib_three_way_comparison.
(three_way_comparable, three_way_comparable_with)
(compare_three_way, weak_order, strong_order, partial_order):
Guard behind __cpp_lib_three_way_comparison >= 201907L.
* include/std/chrono: Drop __cpp_lib_chrono definition.
* include/std/vector: Switch to bits/version.h for
__cpp_lib_erase_if.
(erase, erase_if): Guard behind __cpp_lib_erase_if.
* include/std/variant: Switch to bits/version.h for
__cpp_lib_variant.  Guard whole header behind that FTM.
* include/std/utility: Switch to bits/version.h for
__cpp_lib_{exchange_function,constexpr_algorithms,as_const},
__cpp_lib_{integer_comparison_functions,to_underlying}, and
__cpp_lib_unreachable.
(exchange): Guard behind __cpp_lib_exchange_function.
(cmp_equal, cmp_not_equal, cmp_less, cmp_greater, cmp_less_equal)
(cmp_greater_equal, in_range): Guard behind
__cpp_lib_integer_comparison_functions.
(to_underlying): Guard behind __cpp_lib_to_underlying.
(unreachable): Guard behind __cpp_lib_unreachable.
* include/std/type_traits: Switch to bits/version.h for
__cpp_lib_is_{null_pointer,final,nothrow_convertible,aggregate},
__cpp_lib_is_{constant_evaluated,invocable,layout_compatible},
__cpp_lib_is_{pointer_interconvertible,scoped_enum,swappable},
__cpp_lib_{logical_traits,reference_from_temporary,remove_cvref},
__cpp_lib_{result_of_sfinae,transformation_trait_aliases},
__cpp_lib_{type_identity,type_trait_variable_templates},
__cpp_lib_{unwrap_ref,void_t,integral_constant_callable},
__cpp_lib_{bool_constant,bounded_array_traits}, and
__cpp_lib_has_unique_object_representations.
(integral_constant::operator()): Guard behind
__cpp_lib_integral_constant_callable.
(bool_constant): Guard behind __cpp_lib_bool_constant.
(conjunction, disjunction, negation, conjunction_v, disjunction_v)
(negation_v): Guard behind __cpp_lib_logical_traits.
(is_null_pointer): Guard behind __cpp_lib_is_null_pointer.
(is_final): Guard behind __cpp_lib_is_final.
(is_nothrow_convertible, is_nothrow_convertible_v): Guard behind
__cpp_lib_is_nothrow_convertible.
(remove_const_t, remove_volatile_t, remove_cv_t)
(add_const_t, add_volatile_t, add_cv_t): Guard behind
__cpp_lib_transformation_trait_aliases.
(void_t): Guard behind __cpp_lib_void_t.
(is_swappable_with_v, is_nothrow_swappable_with_v)
(is_swappable_with, is_nothrow_swappable_with): Guard behind
__cpp_lib_is_swappable.
(is_nothrow_invocable_r, is_invocable_r, invoke_result)
(is_invocable, invoke_result_t): Guard behind
__cpp_lib_is_invocable.
(alignment_of_v, extent_v, has_virtual_destructor_v)
(is_abstract_v, is_arithmetic_v, is_array_v)
(is_assignable_v, is_base_of_v, is_class_v, is_compound_v)
(is_constructible_v, is_const_v, is_convertible_v)
(is_copy_assignable_v, is_copy_constructible_v)
(is_default_constructible_v, is_destructible_v)
(is_empty_v, is_enum_v, is_final_v, is_floating_point_v)
(is_function_v, is_fundamental_v, is_integral_v)
(is_invocable_r_v, is_invocable_v, is_literal_type_v)
(is_lvalue_reference_v, is_member_function_pointer_v)
(is_member_object_pointer_v, is_member_pointer_v)
(is_move_assignable_v, is_move_constructible_v)
(is_nothrow_assignable_v, is_nothrow_constructible_v)
(is_nothrow_copy_assignable_v, is_nothrow_copy_constructible_v)
(is_nothrow_default_constructible_v, is_nothrow_destructible_v)
(is_nothrow_invocable_r_v, is_nothrow_invocable_v)
(is_nothrow_move_assignable_v, is_nothrow_move_constructible_v)
(is_null_pointer_v, is_object_v, is_pod_v, is_pointer_v)
(is_polymorphic_v, is_reference_v, is_rvalue_reference_v)
(is_same_v, is_scalar_v, is_signed_v, is_standard_layout_v)
(is_trivially_assignable_v, is_trivially_constructible_v)
(is_trivially_copyable_v, is_trivially_copy_assignable_v)
(is_trivially_copy_constructible_v)
(is_trivially_default_constructible_v)
(is_trivially_destructible_v, is_trivially_move_assignable_v)
(is_trivially_move_constructible_v, is_trivial_v, is_union_v)
(is_unsigned_v, is_void_v, is_volatile_v, rank_v, as variadic):
Guard behind __cpp_lib_type_trait_variable_templates.
(has_unique_object_representations)
(has_unique_object_representations_v): Guard behind
__cpp_lib_has_unique_object_representation.
(is_aggregate): Guard behind __cpp_lib_is_aggregate.
(remove_cvref, remove_cvref_t): Guard behind
__cpp_lib_remove_cvref.
(type_identity, type_identity_t): Guard behind
__cpp_lib_type_identity.
(unwrap_reference, unwrap_reference_t, unwrap_ref_decay)
(unwrap_ref_decay_t): Guard behind __cpp_lib_unwrap_ref.
(is_bounded_array_v, is_unbounded_array_v, is_bounded_array)
(is_unbounded_array): Guard behind __cpp_lib_bounded_array_traits.
(is_scoped_enum, is_scoped_enum_v): Guard behind
__cpp_lib_is_scoped_enum.
(reference_constructs_from_temporary)
(reference_constructs_from_temporary_v): Guard behind
__cpp_lib_reference_from_temporary.
* include/std/tuple: Switch to bits/version.h for
__cpp_lib_{constexpr_tuple,tuple_by_type,apply_make_from_tuple}.
(get<T>): Guard behind __cpp_lib_tuple_by_type.
(apply): Guard behind __cpp_lib_apply.
(make_from_tuple): Guard behind __cpp_lib_make_from_tuple.
* include/std/syncstream: Switch to bits/version.h for
__cpp_lib_syncbuf.  Guard header behind that FTM.
* include/std/string_view: Switch to bits/version.h for
__cpp_lib_{string_{view,contains},constexpr_string_view} and
__cpp_lib_starts_ends_with.
(basic_string_view::starts_with, basic_string_view::ends_with):
Guard behind __cpp_lib_starts_ends_with.
[C++23 && _GLIBCXX_HOSTED && !defined(__cpp_lib_string_contains)]:
Assert as impossible ithout a bug in C++23.
* include/std/string: Switch to bits/version.h for
__cpp_lib_erase_if.
(erase, erase_if): Guard behind __cpp_lib_erase_if.
* include/std/thread: Switch to bits/version.h for
__cpp_lib_jthread.
* include/std/stop_token: Switch to bits/version.h for
__cpp_lib_jthread.
* include/std/spanstream: Switch to bits/version.h for
__cpp_lib_spanstream.  Guard header behind that FTM.
* include/std/span: Switch to bits/version.h for __cpp_lib_span.
Guard header behind that FTM.
* include/std/source_location: Switch to bits/version.h for
__cpp_lib_source_location.  Guard header with that FTM.
* include/std/shared_mutex: Switch to bits/version.h for
__cpp_lib_shared{,_timed}_mutex.
(shared_mutex): Guard behind __cpp_lib_shared_mutex.
* include/std/semaphore: Switch to bits/version.h for
__cpp_lib_semaphore.  Guard header behind that FTM.
* include/std/ranges: Switch to bits/version.h for
__cpp_lib_ranges_{zip,chunk{,_by},slide,join_with},
__cpp_lib_ranges_{repeat_stride,cartesian_product,as_rvalue},
and __cpp_lib_ranges_{as_const,enumerate,iota}.
(ranges::zip et al, ranges::chunk et al, ranges::slide et al)
(ranges::chunk_by et al, ranges::join_with et al)
(ranges::stride et al, ranges::cartesian_product et al)
(ranges::as_rvalue et al, ranges::as_const et al)
(ranges::enumerate et al): Guard behind appropriate FTM.
* include/std/optional: Switch to bits/version.h for
__cpp_lib_optional.  Guard header behind that FTM.
* include/std/numeric: Switch to bits/version.h for
__cpp_lib_{gcd{,_lcm},lcm,constexpr_numeric,interpolate}
and __cpp_lib_parallel_algorithm.
(gcd, lcm): Guard behind __cpp_lib_gcd_lcm.
(midpoint): Guard behind __cpp_lib_interpolate.
* include/std/numbers: Switch to bits/version.h for
__cpp_lib_math_constants.  Guard header behind that FTM.
* include/std/mutex: Switch to bits/version.h for
__cpp_lib_scoped_lock.
(scoped_Lock): Guard behind __cpp_lib_scoped_lock.
* include/std/memory_resource: Switch to bits/version.h for
__cpp_lib_{polymorphic_allocator,memory_resource}.
(synchronized_pool_resource): Guard behind
__cpp_lib_memory_resource >= 201603L.
(polymorphic_allocator): Guard behind
__cpp_lib_polymorphic_allocator.
* include/std/memory: Switch to bits/version.h for
__cpp_lib_{parallel_algorithm,atomic_value_initialization}.
* include/std/list: Switch to bits/version.h for
__cpp_lib_erase_if.
(erase, erase_if): Guard behind __cpp_lib_erase_if.
* include/std/latch: Switch to bits/version.h for __cpp_lib_latch.
Guard header behind that FTM.
* include/std/iterator: Switch to bits/version.h for
__cpp_lib_null_iterators.
* include/std/iomanip: Switch to bits/version.h for
__cpp_lib_quoted_string_io.
(quoted): Guard behind __cpp_lib_quoted_string_io.
* include/std/functional: Switch to bits/version.h for
__cpp_lib_{invoke{,_r},constexpr_functional,bind_front} and
__cpp_lib_{not_fn,booyer_moore_searcher}.
(invoke): Guard behind __cpp_lib_invoke.
(invoke_r): Guard behind __cpp_lib_invoke_r.
(bind_front): Guard behind __cpp_lib_bind_front.
(not_fn): Guard behind __cpp_lib_not_fn.
(boyer_moore_searcher, boyer_moore_horspool_searcher): Guard
definition behind __cpp_lib_boyer_moore_searcher.
* include/std/forward_list: Switch to bits/version.h for
__cpp_lib_erase_if.
(erase, erase_if): Guard behind __cpp_lib_erase_if.
* include/std/format: Switch to bits/version.h for
__cpp_lib_format.  Guard header behind that FTM.
* include/std/filesystem: Switch to bits/version.h for
__cpp_lib_filesystem.  Guard header behind that FTM.
* include/std/expected: Switch to bits/version.h for
__cpp_lib_expected.  Guard header behind it.
* include/std/execution: Switch to bits/version.h for
__cpp_lib_{execution,parallel_algorithm}.  Guard header behind
either.
* include/std/deque: Switch to bits/version.h for
__cpp_lib_erase_if.
(erase, erase_if): Guard behind __cpp_lib_erase_if.
* include/std/coroutine: Switch to bits/version.h for
__cpp_lib_coroutine.  Guard header behind that FTM.
* include/std/concepts: Switch to bits/version.h for
__cpp_lib_concepts.  Guard header behind that FTM.
* include/std/complex: Switch to bits/version.h for
__cpp_lib_{complex_udls,constexpr_complex}.
(operator""if, operator""i, operator""il): Guard behind
__cpp_lib_complex_udls.
* include/std/charconv: Swtich to bits/version.h for
__cpp_lib_{to_chars,constexpr_charconv}.
* include/std/bitset: Switch to bits/version.h for
__cpp_lib_constexpr_bitset.
* include/std/bit: Switch to bits/version.h for
__cpp_lib_{bit_cast,byteswap,bitops,int_pow2,endian}.
(bit_cast): Guard behind __cpp_lib_bit_cast.
(byteswap): Guard behind __cpp_lib_byteswap.
(rotl, rotr, countl_zero, countl_one, countr_zero, countr_one)
(popcount): Guard behind __cpp_lib_bitops.
(has_single_bit, bit_ceil, bit_floor, bit_width): Guard behind
__cpp_lib_int_pow2.
(endian): Guard behind __cpp_lib_endian.
* include/std/barrier: Switch to bits/version.h for
__cpp_lib_barrier.  Guard header behind that FTM.
* include/std/atomic: Switch to bits/version.h for
__cpp_lib_atomic_{is_always_lock_free,float,ref}
and __cpp_lib_lock_free_type_aliases.
(*::is_always_lock_free): Guard behind
__cpp_lib_atomic_is_always_lock_free.
(atomic<float>): Guard behind __cpp_lib_atomic_float.
(atomic_ref): Guard behind __cpp_lib_atomic_ref.
(atomic_signed_lock_free, atomic_unsigned_lock_free): Guard behind
__cpp_lib_atomic_lock_free_type_aliases.
* include/std/array: Switch to bits/version.h for
__cpp_lib_to_array.
(to_array): Guard behind __cpp_lib_to_array.
* include/std/any: Switch to bits/version.h for __cpp_lib_any.
Guard header behind that FTM.
* include/std/algorithm: Switch to bits/version.h for
__cpp_lib_parallel_algorithm.
* include/c_global/cstddef: Switch to bits/version.h for
__cpp_lib_byte.
(byte): Guard behind __cpp_lib_byte.
* include/c_global/cmath: Switch to bits/version.h for
__cpp_lib_{hypot,interpolate}.
(hypot3): Guard behind __cpp_lib_hypot.
(lerp): Guard behind __cpp_lib_interpolate.
* include/c_compatibility/stdatomic.h: Switch to
bits/stl_version.h for __cpp_lib_atomic.  Guard header behind that
FTM.
* include/bits/utility.h: Switch to bits/version.h for
__cpp_lib_{tuple_element_t,integer_sequence,ranges_zip}.
(tuple_element_t): Guard behind __cpp_lib_tuple_element_t.
(integer_sequence et al): Guard behind __cpp_lib_integer_sequence.
* include/bits/uses_allocator_args.h: Switch to bits/version.h for
__cpp_lib_make_obj_using_allocator.  Guard header behind that FTM.
* include/bits/unordered_map.h: Switch to bits/version.h for
__cpp_lib_unordered_map_try_emplace.
(try_emplace): Guard behind __cpp_lib_unordered_map_try_emplace.
* include/bits/unique_ptr.h: Switch to bits/version.h for
__cpp_lib_{constexpr_memory,make_unique}.
(make_unique): Guard behind __cpp_lib_make_unique.
* include/bits/stl_vector.h: Switch to bits/version.h for
__cpp_lib_constexpr_vector.
* include/bits/stl_uninitialized.h: Switch to bits/version.h for
__cpp_lib_raw_memory_algorithms.
(uninitialized_default_construct)
(uninitialized_default_construct_n, uninitialized_move)
(uninitialized_move_n, uninitialized_value_construct)
(uninitialized_value_construct_n): Guard behind
__cpp_lib_raw_memory_algorithms.
* include/bits/stl_tree.h: Switch to bits/version.h for
__cpp_lib_generic_associative_lookup.
* include/bits/stl_stack.h: Switch to bits/version.h for
__cpp_lib_adaptor_iterator_pair_constructor.
(stack): Guard iterator-pair constructor behind
__cpp_lib_adaptor_iterator_pair_constructor.
* include/bits/stl_queue.h: Switch to bits/version.h for
__cpp_lib_adaptor_iterator_pair_constructor.
(queue): Guard iterator-pair constructor behind
__cpp_lib_adaptor_iterator_pair_constructor.
* include/bits/stl_pair.h: Switch to bits/version.h for
__cpp_lib_{concepts,tuples_by_type}.
(get): Guard type-getting overloads behind
__cpp_lib_tuples_by_type.
* include/bits/stl_map.h: Switch to bits/version.h for
__cpp_lib_map_try_emplace.
(map<>::try_emplace): Guard behind __cpp_lib_map_try_emplace.
* include/bits/stl_list.h: Switch to bits/version.h for
__cpp_lib_list_remove_return_type.
(__remove_return_type, _GLIBCXX_LIST_REMOVE_RETURN_TYPE_TAG)
[C++20]: guard behind __cpp_lib_list_remove_return_type instead.
* include/bits/stl_iterator.h: Switch to bits/version.h for
__cpp_lib_{constexpr_iterator,array_constexpr} and
__cpp_lib_{make_reverse_iterator,move_iterator_concept}.
(make_reverse_iterator): Guard behind
__cpp_lib_make_reverse_iterator.
(iterator_concept et al): Guard __cpp_lib_move_iterator_concept
changes behind that FTM.
* include/bits/stl_function.h: Switch to bits/version.h for
__cpp_lib_transparent_operators.
(equal_to, not_equal_to, greater, less, greater_equal)
(less_equal, bit_and, bit_or, bit_xor, bit_not, logical_and)
(logical_or, logical_not, plus, minus, multiplies, divides)
(modulus, negate): Guard '= void' fwdecls behind
__cpp_lib_transparent_operators.
(plus<void>, minus<void>, multiplies<void>, divides<void>)
(modulus<void>, negate<void>, logical_and<void>, logical_or<void>)
(logical_not<void>, bit_and<void>, bit_or<void>, bit_xor<void>)
(equal_to<void>, not_equal_to<void>, greater<void>, less<void>)
(greater_equal<void>, less_equal<void>, bit_not<void>)
(__has_is_transparent): Guard behind
__cpp_lib_transparent_operators.
* include/bits/stl_algobase.h: Switch to bits/version.h for
__cpp_lib_robust_nonmodifying_seq_ops.
(robust equal, mismatch): Guard behind
__cpp_lib_nonmember_container_access.
* include/bits/stl_algo.h: Swtich to bits/version.h for
__cpp_lib_{clamp,sample}.
(clamp): Guard behind __cpp_lib_clamp.
(sample): Guard behind __cpp_lib_sample.
* include/bits/specfun.h: Switch to bits/version.h for
__cpp_lib_math_special_functions and __STDCPP_MATH_SPEC_FUNCS__.
* include/bits/shared_ptr_base.h: Switch to bits/version.h for
__cpp_lib_{smart_ptr_for_overwrite,shared_ptr_arrays}.
(_Sp_overwrite_tag): Guard behind
__cpp_lib_smart_ptr_for_overwrite.
* include/bits/shared_ptr_atomic.h: Switch to bits/version.h for
__cpp_lib_atomic_shared_ptr.
* include/bits/shared_ptr.h: Switch to bits/version.h for
__cpp_lib_{enable_shared_from_this,shared_ptr_weak_type}.
(shared_ptr<T>::weak_type): Guard behind
__cpp_lib_shared_ptr_weak_type.
(enable_shared_from_this<T>::weak_from_this): Guard behind
__cpp_lib_enable_shared_from_this.
* include/bits/ranges_cmp.h: Switch to bits/version.h for
__cpp_lib_ranges.
* include/bits/ranges_algo.h: Switch to bits/version.h for
__cpp_lib_{shift,ranges_{contains,find_last,fold,iota}}.
* include/bits/range_access.h: Switch to bits/version.h for
__cpp_lib_nonmember_container_access
(size, empty, data): Guard behind
__cpp_lib_nonmember_container_access.
(ssize): Guard behind __cpp_lib_ssize.
* include/bits/ptr_traits.h: Switch to bits/version.h. for
__cpp_lib_{constexpr_memory,to_address}.
(to_address): Guard behind __cpp_lib_to_address.
* include/bits/node_handle.h: Switch to bits/version.h for
__cpp_lib_node_extract.  Guard header behind that FTM.
* include/bits/move_only_function.h: Switch to bits/version.h for
__cpp_lib_move_only_function.  Guard header behind that FTM.
* include/bits/move.h: Switch to bits/version.h for
__cpp_lib_addressof_constexpr.
* include/bits/ios_base.h: Switch to bits/version.h for
__cpp_lib_ios_noreplace.
(noreplace): Guard with __cpp_lib_ios_noreplace.
* include/bits/hashtable.h: Switch to bits/version.h for
__cpp_lib_generic_unordered_lookup.
(_M_equal_range_tr, _M_count_tr, _M_find_tr): Guard behind
__cpp_lib_generic_unordered_lookup.
* include/bits/forward_list.h: Switch to bits/version.h for
__cpp_lib_list_remove_return_type.
(__remove_return_type): Guard behind
__cpp_lib_list_remove_return_type.
* include/bits/erase_if.h: Switch to bits/version.h for
__cpp_lib_erase_if.
* include/bits/cow_string.h: Switch to bits/version.h for
__cpp_lib_constexpr_string.
* include/bits/chrono.h: Swtich to bits/version.h for
__cpp_lib_chrono{,_udls}.
(ceil): Guard behind __cpp_lib_chrono.
(operator""ns et al): Guard behind __cpp_lib_chrono_udls.
* include/bits/char_traits.h: Switch to bits/version.h for
__cpp_lib_constexpr_char_traits.
* include/bits/basic_string.h: Switch to bits/version.h for
__cpp_lib_{constexpr_string,string_{resize_and_overwrite,udls}}.
(resize_and_overwrite): Guard behind
__cpp_lib_string_resize_and_overwrite.
(operator""s): Guard behind __cpp_lib_string_udls.
* include/bits/atomic_wait.h: Switch to bits/version.h for
__cpp_lib_atomic_wait.  Guard header behind that FTM.
* include/bits/atomic_base.h: Switch to bits/version.h for
__cpp_lib_atomic_value_initialization and
__cpp_lib_atomic_flag_test.
(atomic_flag::test): Guard behind __cpp_lib_atomic_flag_test,
rather than C++20.
* include/bits/allocator.h: Switch to bits/version.h for
__cpp_lib_incomplete_container_elements.
* include/bits/alloc_traits.h: Switch to using bits/version.h for
__cpp_lib_constexpr_dynamic_alloc and
__cpp_lib_allocator_traits_is_always_equal.
* include/bits/align.h: Switch to bits/version.h for defining
__cpp_lib_assume_aligned.
(assume_aligned): Guard with __cpp_lib_assume_aligned.
* include/bits/algorithmfwd.h: Switch to bits/version.h for
defining __cpp_lib_constexpr_algorithms.
* include/std/stacktrace: Switch to bits/version.h for
__cpp_lib_stacktrace.  Guard header behind that FTM.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Update line numbers.

commit | commitdiff | tree

Arsen Arsenović [Mon, 27 Mar 2023 13:29:08 +0000 (15:29 +0200)]

libstdc++: Implement more maintainable <version> header

This commit replaces the ad-hoc logic in <version> with an AutoGen
database that (mostly) declaratively generates a version.h bit which
combines all of the FTM logic across all headers together.

This generated header defines macros of the form __glibcxx_foo,
equivalent to their __cpp_lib_foo variants, according to rules specified
in version.def and, optionally, if __glibcxx_want_foo or
__glibcxx_want_all are defined, also defines __cpp_lib_foo forms with
the same definition.

libstdc++-v3/ChangeLog:

* include/Makefile.am (bits_freestanding): Add version.h.
(allcreated): Add version.h.
(${bits_srcdir}/version.h): New rule.  Regenerates
version.h out of version.{def,tpl}.
* include/Makefile.in: Regenerate.
* include/bits/version.def: New file.  Declares a list of
all feature test macros, their values and their preconditions.
* include/bits/version.tpl: New file.  Turns version.def
into a sequence of #if blocks.
* include/bits/version.h: New file.  Generated from
version.def.
* include/std/version: Replace with a __glibcxx_want_all define
and bits/version.h include.

commit | commitdiff | tree

Richard Ball [Wed, 16 Aug 2023 13:04:20 +0000 (14:04 +0100)]

aarch64: Add support for Cortex-A720 CPU

This patch adds support for the Cortex-A720 CPU to GCC.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-A720 CPU.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document Cortex-A720 CPU.

commit | commitdiff | tree

Robin Dapp [Mon, 31 Jul 2023 15:54:35 +0000 (17:54 +0200)]

RISC-V: Implement vector "average" autovec pattern.

This patch adds vector average patterns

op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;

If there is no direct support, the vectorizer can synthesize the pattern
but, presumably, due to lack of narrowing operation support, won't try a
narrowing shift. Therefore, this patch implements the expanders
instead.

gcc/ChangeLog:

* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
Implement expander.
(<u>avg<v_double_trunc>3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-template.h: New test.

commit | commitdiff | tree

Robin Dapp [Tue, 15 Aug 2023 09:43:43 +0000 (11:43 +0200)]

internal-fn: Fix vector extraction into promoted subreg.

This patch fixes the case where vec_extract gets passed a promoted
subreg (e.g. from a return value). This is achieved by using
expand_convert_optab_fn instead of a separate expander function.

gcc/ChangeLog:

* internal-fn.cc (vec_extract_direct): Change type argument
numbers.
(expand_vec_extract_optab_fn): Call convert_optab_fn.
(direct_vec_extract_optab_supported_p): Use
convert_optab_supported_p.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4u.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c: New test.

commit | commitdiff | tree

Prathamesh Kulkarni [Wed, 16 Aug 2023 11:21:44 +0000 (16:51 +0530)]

Extend fold_vec_perm to handle VLA vector_cst.

The patch extends fold_vec_perm to fold VLA vector_csts.

For eg:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { 0, len, ...} npatterns = 2, nelts_per_pattern = 1, len = 4 + 4x

res = VEC_PERM_EXPR<arg0, arg1, sel>
--> { arg0[0], arg1[0], ... }, npatterns = 2, nelts_per_pattern = 1

Eg 2:
arg0 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
arg1 = {...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x
sel = {0, 1, 2, ...}, npatterns = 1, nelts_per_pattern = 3, len = 2 + 2x

For this case the index 2 in sel is ambiguous for len 2 + 2x:
if x = 0, runtime vector length = 2 and sel[i] will choose arg1[0]
if x > 0, runtime vector length > 2 and sel[i] choose arg0[2].
So we return NULL_TREE for this case.

This leads us to defining a constraint that a stepped sequence in sel,
should only select a particular pattern from a particular input vector.

Eg 3:
arg0 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
arg1 = {...} npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x
sel = { len, 0, 2, ... } npatterns = 1, nelts_per_pattern = 3, len = 4 + 4x

sel contains a single pattern with stepped sequence: {0, 2, ...}.
Let, a1 = the first element of stepped part of sequence, which is 0.

Let esel = number of total elements in stepped sequence.
Thus,
esel = len / sel_npatterns
     = (4 + 4x) / 1
     = 4 + 4x

Let S = step of the sequence, which is 2 in this case.

Let ae = last element of the stepped sequence.
Thus,
ae = a1 + (esel - 2) * S
   = 0 + (4 + 4x - 2) * 2
   = 4 + 8x

To ensure that we select elements from the same input vector,
a1 /trunc len = ae /trunc len.
Let, q1 = a1 /trunc len = 0 / (4 + 4x) = 0
Let, qe = ae /trunc len = (4 + 8x) / (4 + 4x) = 1
Since q1 != qe, we cross input vectors, and return NULL_TREE for this case.

However, if sel was:
sel = {len, 0, 1, ...}

The only change in this case is S = 1.
So,
ae = a1 + (esel - 2) * S
   = 0 + (4 + 4x - 2) * 1
   = 2 + 4x

In this case, a1/len == ae/len == 0, and the stepped sequence chooses all elements
from arg0.
Thus,
res = {arg1[0], arg0[0], arg0[1], ...}

For VLA folding, sel has to conform to constraints imposed in
valid_mask_for_fold_vec_perm_cst_p.
test_fold_vec_perm_cst defines several unit-tests for VLA folding.

gcc/ChangeLog:
* fold-const.cc (INCLUDE_ALGORITHM): Add Include.
(valid_mask_for_fold_vec_perm_cst_p): New function.
(fold_vec_perm_cst): Likewise.
(fold_vec_perm): Adjust assert and call fold_vec_perm_cst.
(test_fold_vec_perm_cst): New namespace.
(test_fold_vec_perm_cst::build_vec_cst_rand): New function.
(test_fold_vec_perm_cst::validate_res): Likewise.
(test_fold_vec_perm_cst::validate_res_vls): Likewise.
(test_fold_vec_perm_cst::builder_push_elems): Likewise.
(test_fold_vec_perm_cst::test_vnx4si_v4si): Likewise.
(test_fold_vec_perm_cst::test_v4si_vnx4si): Likewise.
(test_fold_vec_perm_cst::test_all_nunits): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_2): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_4): Likewise.
(test_fold_vec_perm_cst::test_nunits_min_8): Likewise.
(test_fold_vec_perm_cst::test_nunits_max_4): Likewise.
(test_fold_vec_perm_cst::is_simple_vla_size): Likewise.
(test_fold_vec_perm_cst::test): Likewise.
(fold_const_cc_tests): Call test_fold_vec_perm_cst::test.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 07:55:42 +0000 (15:55 +0800)]

RISC-V: Support RVV VFWCVT.XU.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWCVT.X.F.V as the below samples.

* __riscv_vfwcvt_xu_f_v_u64m2_rm
* __riscv_vfwcvt_xu_f_v_u64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_xu_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wcvt-xu.c: New test.

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 09:40:20 +0000 (17:40 +0800)]

RISC-V: Fix one build error for template default arg

In some build option combination, the default value may result in
below error. This patch would like to fix it by passing a explict
argument.

riscv-vector-builtins-bases.cc:2495:24: error: invalid use of template-name \
‘riscv_vector::vfcvt_f’ without an argument list

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Use explicit argument.

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 07:21:56 +0000 (15:21 +0800)]

RISC-V: Support RVV VFWCVT.X.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFWCVT.X.F.V as the below samples.

* __riscv_vfwcvt_x_f_v_i64m2_rm
* __riscv_vfwcvt_x_f_v_i64m2_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwcvt_x_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wcvt-x.c: New test.

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 06:47:52 +0000 (14:47 +0800)]

RISC-V: Support RVV VFCVT.F.X.V and VFCVT.F.XU.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFCVT.F.X.V and VFCVT.F.XU.V as the below samples.

* __riscv_vfcvt_f_x_v_f32m1_rm
* __riscv_vfcvt_f_x_v_f32m1_rm_m
* __riscv_vfcvt_f_xu_v_f32m1_rm
* __riscv_vfcvt_f_xu_v_f32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_f_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-f.c: New test.

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 06:18:35 +0000 (14:18 +0800)]

RISC-V: Support RVV VFCVT.XU.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFCVT.XU.F.V as the below samples.

* __riscv_vfcvt_xu_f_v_u32m1_rm
* __riscv_vfcvt_xu_f_v_u32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_xu_frm): New intrinsic function def..

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-xu.c: New test.

commit | commitdiff | tree

Haochen Gui [Wed, 16 Aug 2023 06:29:36 +0000 (14:29 +0800)]

rs6000: Skip unnecessary vector extract for certain elements.

If the extracted element index is:
- for byte, 7 on BE while 8 on LE;
- for half word, 3 on BE while 4 on LE;

the element to be stored is already in the corresponding place for
stxsi[hb]x. We don't need a redundant vector extraction at all.

gcc/
PR target/110429
* config/rs6000/vsx.md (*vsx_extract_<mode>_store_p9): Skip vector
extract when the element is 7 on BE while 8 on LE for byte or 3 on
BE while 4 on LE for halfword.

gcc/testsuite/
PR target/110429
* gcc.target/powerpc/pr110429.c: New.

commit | commitdiff | tree

Haochen Gui [Wed, 16 Aug 2023 06:21:09 +0000 (14:21 +0800)]

rs6000: Generate mfvsrwz for all platforms and remove redundant zero extend

mfvsrwz has lower latency than xxextractuw or vextuw[lr]x.  So it should be
generated even with p9 vector enabled.  Also the instruction is already
zero extended.  A combine pattern is needed to eliminate redundant zero
extend instructions.

gcc/
PR target/106769
* config/rs6000/vsx.md (expand vsx_extract_<mode>): Set it only
for V8HI and V16QI.
(vsx_extract_v4si): New expand for V4SI extraction.
(vsx_extract_v4si_w1): New insn pattern for V4SI extraction on
word 1 from BE order.
(*mfvsrwz): New insn pattern for mfvsrwz.
(*vsx_extract_<mode>_di_p9): Assert that it won't be generated on
word 1 from BE order.
(*vsx_extract_si): Remove.
(*vsx_extract_v4si_w023): New insn and split pattern on word 0, 2,
3 from BE order.

gcc/testsuite/
PR target/106769
* gcc.target/powerpc/pr106769.h: New.
* gcc.target/powerpc/pr106769-p8.c: New.
* gcc.target/powerpc/pr106769-p9.c: New.

commit | commitdiff | tree

Juzhe-Zhong [Wed, 16 Aug 2023 01:45:19 +0000 (09:45 +0800)]

RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}

This patch allow us auto-vectorize this following case:

  void __attribute__ ((noinline, noclone))                                     \
  NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src,                  \
    MASKTYPE *__restrict cond, intptr_t n)                             \
  {                                                                            \
    for (intptr_t i = 0; i < n; ++i)                                           \
      if (cond[i])                                                             \
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2]                \
   + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5]          \
   + src[i * 8 + 6] + src[i * 8 + 7]);                         \
  }

  TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t)                               \

  TEST2 (NAME##_i32, OUTTYPE, int32_t)                                         \

  TEST1 (NAME##_i32, int32_t)                                                  \

TEST (test)

ASM:

test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.vi v0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
slli a6,a4,2
vadd.vv v1,v9,v8
slli a7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (vec_mask_len_load_lanes<mode><vsingle>):
New pattern.
(vec_mask_len_store_lanes<mode><vsingle>): Ditto.
* config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
* config/riscv/riscv-v.cc (get_mask_mode): Add tuple mask mode.
(expand_lanes_load_store): New function.
* config/riscv/vector-iterators.md: New iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add lanes tests.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-9.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c: New test.

commit | commitdiff | tree

Juzhe-Zhong [Tue, 15 Aug 2023 12:29:17 +0000 (20:29 +0800)]

VECT: Apply MASK_LEN_{LOAD_LANES, STORE_LANES} into vectorizer

Hi, Richard and Richi.

This patch is adding MASK_LEN_{LOAD_LANES,STORE_LANES} support into vectorizer.

Consider this simple case:

void __attribute__ ((noinline, noclone))
foo (int *__restrict a, int *__restrict b, int *__restrict c,
  int *__restrict d, int *__restrict e, int *__restrict f,
  int *__restrict g, int *__restrict h, int *__restrict j, int n)
{
  for (int i = 0; i < n; ++i)
    {
      a[i] = j[i * 8];
      b[i] = j[i * 8 + 1];
      c[i] = j[i * 8 + 2];
      d[i] = j[i * 8 + 3];
      e[i] = j[i * 8 + 4];
      f[i] = j[i * 8 + 5];
      g[i] = j[i * 8 + 6];
      h[i] = j[i * 8 + 7];
    }
}

RVV Gimple IR:

  _79 = .SELECT_VL (ivtmp_81, POLY_INT_CST [4, 4]);
  ivtmp_125 = _79 * 32;
  vect_array.8 = .MASK_LEN_LOAD_LANES (vectp_j.6_124, 32B, { -1, ... }, _79, 0);
  vect__8.9_122 = vect_array.8[0];
  vect__8.10_121 = vect_array.8[1];
  vect__8.11_120 = vect_array.8[2];
  vect__8.12_119 = vect_array.8[3];
  vect__8.13_118 = vect_array.8[4];
  vect__8.14_117 = vect_array.8[5];
  vect__8.15_116 = vect_array.8[6];
  vect__8.16_115 = vect_array.8[7];
  vect_array.8 ={v} {CLOBBER};
  ivtmp_114 = _79 * 4;
  .MASK_LEN_STORE (vectp_a.17_113, 32B, { -1, ... }, _79, 0, vect__8.9_122);
  .MASK_LEN_STORE (vectp_b.19_109, 32B, { -1, ... }, _79, 0, vect__8.10_121);
  .MASK_LEN_STORE (vectp_c.21_105, 32B, { -1, ... }, _79, 0, vect__8.11_120);
  .MASK_LEN_STORE (vectp_d.23_101, 32B, { -1, ... }, _79, 0, vect__8.12_119);
  .MASK_LEN_STORE (vectp_e.25_97, 32B, { -1, ... }, _79, 0, vect__8.13_118);
  .MASK_LEN_STORE (vectp_f.27_93, 32B, { -1, ... }, _79, 0, vect__8.14_117);
  .MASK_LEN_STORE (vectp_g.29_89, 32B, { -1, ... }, _79, 0, vect__8.15_116);
  .MASK_LEN_STORE (vectp_h.31_85, 32B, { -1, ... }, _79, 0, vect__8.16_115);

ASM:

foo:
lw t4,8(sp)
ld t5,0(sp)
ble t4,zero,.L5
.L3:
vsetvli t1,t4,e8,mf4,ta,ma
vlseg8e32.v v8,(t5)
slli t3,t1,2
slli t6,t1,5
vse32.v v8,0(a0)
vse32.v v9,0(a1)
vse32.v v10,0(a2)
vse32.v v11,0(a3)
vse32.v v12,0(a4)
vse32.v v13,0(a5)
vse32.v v14,0(a6)
vse32.v v15,0(a7)
sub t4,t4,t1
add t5,t5,t6
add a0,a0,t3
add a1,a1,t3
add a2,a2,t3
add a3,a3,t3
add a4,a4,t3
add a5,a5,t3
add a6,a6,t3
add a7,a7,t3
bne t4,zero,.L3
.L5:
ret

The details of the approach:

Step 1 - Modifiy the LANES LOAD/STORE support function (vect_load_lanes_supported/vect_store_lanes_supported):

+/* Return FN if vec_{masked_,mask_len,}load_lanes is available for COUNT
+   vectors of type VECTYPE.  MASKED_P says whether the masked form is needed. */

-bool
+internal_fn
vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count,
   bool masked_p)
{
-  if (masked_p)
-    return vect_lanes_optab_supported_p ("vec_mask_load_lanes",
- vec_mask_load_lanes_optab,
- vectype, count);
+  if (vect_lanes_optab_supported_p ("vec_mask_len_load_lanes",
+     vec_mask_len_load_lanes_optab,
+     vectype, count))
+    return IFN_MASK_LEN_LOAD_LANES;
+  else if (masked_p)
+    {
+      if (vect_lanes_optab_supported_p ("vec_mask_load_lanes",
+ vec_mask_load_lanes_optab,
+ vectype, count))
+ return IFN_MASK_LOAD_LANES;
+    }
   else
-    return vect_lanes_optab_supported_p ("vec_load_lanes",
- vec_load_lanes_optab,
- vectype, count);
+    {
+      if (vect_lanes_optab_supported_p ("vec_load_lanes",
+ vec_load_lanes_optab,
+ vectype, count))
+ return IFN_LOAD_LANES;
+    }
+  return IFN_LAST;
}

Instead of returning TRUE or FALSE whether target support the LANES LOAD/STORE.
I change it into return internal_fn of the LANES LOAD/STORE that target support,
If target didn't support any LANE LOAD/STORE optabs, return IFN_LAST.

Step 2 - Compute IFN for LANES LOAD/STORE (Only compute once).

      if (!STMT_VINFO_STRIDED_P (first_stmt_info)
  && (can_overrun_p || !would_overrun_p)
  && compare_step_with_zero (vinfo, stmt_info) > 0)
{
  /* First cope with the degenerate case of a single-element
     vector.  */
  if (known_eq (TYPE_VECTOR_SUBPARTS (vectype), 1U))
    ;

  else
    {
      /* Otherwise try using LOAD/STORE_LANES.  */
      *lanes_ifn
= vls_type == VLS_LOAD
    ? vect_load_lanes_supported (vectype, group_size, masked_p)
    : vect_store_lanes_supported (vectype, group_size,
  masked_p);
      if (*lanes_ifn != IFN_LAST)
{
  *memory_access_type = VMAT_LOAD_STORE_LANES;
  overrun_p = would_overrun_p;
}

      /* If that fails, try using permuting loads.  */
      else if (vls_type == VLS_LOAD
? vect_grouped_load_supported (vectype,
single_element_p,
group_size)
: vect_grouped_store_supported (vectype, group_size))
{
  *memory_access_type = VMAT_CONTIGUOUS_PERMUTE;
  overrun_p = would_overrun_p;
}
    }
}

Step 3 - Build MASK_LEN_{LANES_LOAD,LANES_STORE} Gimple IR:

+   if (lanes_ifn == IFN_MASK_LEN_STORE_LANES)
+     {
+       if (loop_lens)
+ final_len = vect_get_loop_len (loop_vinfo, gsi, loop_lens,
+        ncopies, vectype, j, 1);
+       else
+ final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype));
+       signed char biasval
+ = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+       bias = build_int_cst (intQI_type_node, biasval);
+       if (!final_mask)
+ {
+   mask_vectype = truth_type_for (vectype);
+   final_mask = build_minus_one_cst (mask_vectype);
+ }
+     }
+
  gcall *call;
-   if (final_mask)
+   if (final_len && final_mask)
+     {
+       /* Emit:
+    MASK_LEN_STORE_LANES (DATAREF_PTR, ALIAS_PTR, VEC_MASK,
+ LEN, BIAS, VEC_ARRAY).  */
+       unsigned int align = TYPE_ALIGN (TREE_TYPE (vectype));
+       tree alias_ptr = build_int_cst (ref_type, align);
+       call = gimple_build_call_internal (IFN_MASK_LEN_STORE_LANES, 6,
+ dataref_ptr, alias_ptr,
+ final_mask, final_len, bias,
+ vec_array);
+     }
+   else if (final_mask)

The LEN and MASK flow is totally the same as other MASK_LEN_* load/store.

gcc/ChangeLog:

* internal-fn.cc (internal_load_fn_p): Apply
MASK_LEN_{LOAD_LANES,STORE_LANES} into vectorizer.
(internal_store_fn_p): Ditto.
(internal_fn_len_index): Ditto.
(internal_fn_mask_index): Ditto.
(internal_fn_stored_value_index): Ditto.
* tree-vect-data-refs.cc (vect_store_lanes_supported): Ditto.
(vect_load_lanes_supported): Ditto.
* tree-vect-loop.cc: Ditto.
* tree-vect-slp.cc (vect_slp_prefer_store_lanes_p): Ditto.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto.
(get_group_load_store_type): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_store_lanes_supported): Ditto.
(vect_load_lanes_supported): Ditto.

commit | commitdiff | tree

Pan Li [Wed, 16 Aug 2023 05:15:04 +0000 (13:15 +0800)]

RISC-V: Support RVV VFCVT.X.F.V rounding mode intrinsic API

This patch would like to support the rounding mode API for the
VFCVT.X.F.V as the below samples.

* __riscv_vfcvt_x_f_v_i32m1_rm
* __riscv_vfcvt_x_f_v_i32m1_rm_m

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(enum frm_op_type): New type for frm.
(BASE): New declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfcvt_x_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-cvt-x.c: New test.

commit | commitdiff | tree

liuhongt [Thu, 10 Aug 2023 08:26:13 +0000 (16:26 +0800)]

Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions

Rename original use_gather to use_gather_8parts, Support
-mtune-ctrl={,^}use_gather to set/clear tune features
use_gather_{2parts, 4parts, 8parts}. Support the new option -mgather
as alias of -mtune-ctrl=, use_gather, ^use_gather.

Similar for use_scatter.

gcc/ChangeLog:

* config/i386/i386-builtins.cc
(ix86_vectorize_builtin_gather): Adjust for use_gather_8parts.
* config/i386/i386-options.cc (parse_mtune_ctrl_str):
Set/Clear tune features use_{gather,scatter}_{2parts, 4parts,
8parts} for -mtune-crtl={,^}{use_gather,use_scatter}.
* config/i386/i386.cc (ix86_vectorize_builtin_scatter): Adjust
for use_scatter_8parts
* config/i386/i386.h (TARGET_USE_GATHER): Rename to ..
(TARGET_USE_GATHER_8PARTS): .. this.
(TARGET_USE_SCATTER): Rename to ..
(TARGET_USE_SCATTER_8PARTS): .. this.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Rename to
(X86_TUNE_USE_GATHER_8PARTS): .. this.
(X86_TUNE_USE_SCATTER): Rename to
(X86_TUNE_USE_SCATTER_8PARTS): .. this.
* config/i386/i386.opt: Add new options mgather, mscatter.

commit | commitdiff | tree

liuhongt [Thu, 10 Aug 2023 03:41:39 +0000 (11:41 +0800)]

Software mitigation: Disable gather generation in vectorization for GDS affected Intel Processors.

For more details of GDS (Gather Data Sampling), refer to
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html

After microcode update, there's performance regression. To avoid that,
the patch disables gather generation in autovectorization but uses
gather scalar emulation instead.

gcc/ChangeLog:

* config/i386/i386-options.cc (m_GDS): New macro.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Don't
enable for m_GDS.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx2-gather-2.c: Adjust options to keep
gather vectorization.
* gcc.target/i386/avx2-gather-6.c: Ditto.
* gcc.target/i386/avx512f-pr88464-1.c: Ditto.
* gcc.target/i386/avx512f-pr88464-5.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-1.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-11.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-3.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-9.c: Ditto.
* gcc.target/i386/pr88531-1b.c: Ditto.
* gcc.target/i386/pr88531-1c.c: Ditto.

commit | commitdiff | tree

liuhongt [Thu, 20 Jul 2023 01:38:09 +0000 (09:38 +0800)]

Generate vmovapd instead of vmovsd for moving DFmode between SSE_REGS.

vmovapd can enable register renaming and have same code size as
vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than
vmovsh.

When TARGET_AVX512VL is not available, still generate
vmovsd/vmovss/vmovsh to avoid vmovapd/vmovaps zmm16-31.

gcc/ChangeLog:

* config/i386/i386.md (movdf_internal): Generate vmovapd instead of
vmovsd when moving DFmode between SSE_REGS.
(movhi_internal): Generate vmovdqa instead of vmovsh when
moving HImode between SSE_REGS.
(mov<mode>_internal): Use vmovaps instead of vmovsh when
moving HF/BFmode between SSE_REGS.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr89229-4a.c: Adjust testcase.

commit | commitdiff | tree

GCC Administrator [Wed, 16 Aug 2023 00:17:10 +0000 (00:17 +0000)]

Daily bump.

commit | commitdiff | tree

David Faust [Tue, 15 Aug 2023 18:11:23 +0000 (11:11 -0700)]

bpf: remove useless define_insn for extendsisi2

This define_insn is never used, since a sign-extend to the same mode is
just a move, so delete it.

gcc/

* config/bpf/bpf.md (extendsisi2): Delete useless define_insn.

commit | commitdiff | tree

David Faust [Tue, 15 Aug 2023 17:54:17 +0000 (10:54 -0700)]

bpf: fix pseudoc w regs for small modes [PR111029]

In the BPF pseudo-c assembly dialect, registers treated as 32-bits
rather than the full 64 in various instructions ought to be printed as
"wN" rather than "rN". But bpf_print_register () was only doing this
for specifically SImode registers, meaning smaller modes were printed
incorrectly.

This caused assembler errors like:

Error: unrecognized instruction `w2 =(s8)r1'

for a 32-bit sign-extending register move instruction, where the source
register is used in QImode.

Fix bpf_print_register () to print the "w" version of register when
specified by the template for any mode 32-bits or smaller.

PR target/111029

gcc/
* config/bpf/bpf.cc (bpf_print_register): Print 'w' registers
for any mode 32-bits or smaller, not just SImode.

gcc/testsuite/

* gcc.target/bpf/smov-2.c: New test.
* gcc.target/bpf/smov-pseudoc-2.c: New test.

commit | commitdiff | tree

Martin Jambor [Tue, 15 Aug 2023 15:26:13 +0000 (17:26 +0200)]

Feed results of IPA-CP into tree value numbering

PRs 68930 and 92497 show that when IPA-CP figures out constants in
aggregate parameters or when passed by reference but the loads happen
in an inlined function the information is lost.  This happens even
when the inlined function itself was known to have - or even cloned to
have - such constants in incoming parameters because the transform
phase of IPA passes is not run on them.  See discussion in the bugs
for reasons why.

Honza suggested that we can plug the results of IPA-CP analysis into
value numbering, so that FRE can figure out that some loads fetch
known constants.  This is what this patch attempts to do.  The patch
does not attempt to populate partial_defs with information from
IPA-CP, this can be hopefully added as a follow-up.

gcc/ChangeLog:

2023-08-11  Martin Jambor  <mjambor@suse.cz>

PR ipa/68930
PR ipa/92497
* ipa-prop.h (ipcp_get_aggregate_const): Declare.
* ipa-prop.cc (ipcp_get_aggregate_const): New function.
(ipcp_transform_function): Do not deallocate transformation info.
* tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
ipa-prop.h.
(vn_reference_lookup_2): When hitting default-def vuse, query
IPA-CP transformation info for any known constants.

gcc/testsuite/ChangeLog:

2023-06-07  Martin Jambor  <mjambor@suse.cz>

PR ipa/68930
PR ipa/92497
* gcc.dg/ipa/pr92497-1.c: New test.
* gcc.dg/ipa/pr92497-2.c: Likewise.

commit | commitdiff | tree

Iain Buclaw [Tue, 15 Aug 2023 15:10:45 +0000 (17:10 +0200)]

d: Add test case for PR110959.

This ICE is specific to the D front-end language version in GDC 12,
however a test has been added to mainline to catch the unlikely event of
a regression.

PR d/110959

gcc/testsuite/ChangeLog:

* gdc.dg/pr110959.d: New test.

commit | commitdiff | tree

Martin Jambor [Tue, 15 Aug 2023 15:13:44 +0000 (17:13 +0200)]

Fortran: Avoid accessing gfc_charlen when not looking at BT_CHARACTER (PR 110677)

This patch addresses an issue uncovered by the undefined behavior
sanitizer.  In function resolve_structure_cons in resolve.cc there is
a test starting with:

      if (cons->expr->ts.type == BT_CHARACTER && comp->ts.u.cl
  && comp->ts.u.cl->length
  && comp->ts.u.cl->length->expr_type == EXPR_CONSTANT

and UBSAN complained of loads from comp->ts.u.cl->length->expr_type of
integer value 1818451807 which is outside of the value range expr_t
enum.  If I understand the code correctly it the entire load was
unwanted because comp->ts.type in those cases is BT_CLASS and not
BT_CHARACTER.  This patch simply adds a check to make sure it is only
accessed in those cases.

During review, Harald Anlauf noticed that length types also need to be
checked and so I added also checks that he suggested to the condition.

Co-authored-by: Harald Anlauf <anlauf@gmx.de>
gcc/fortran/ChangeLog:

2023-08-14  Martin Jambor  <mjambor@suse.cz>

PR fortran/110677
* resolve.cc (resolve_structure_cons): Check comp->ts is character
type before accessing stuff through comp->ts.u.cl.

commit | commitdiff | tree

Chung-Lin Tang [Tue, 6 Jun 2023 10:46:29 +0000 (03:46 -0700)]

OpenACC 2.7: default clause support for data constructs

This patch implements the OpenACC 2.7 addition of default(none|present) support
for data constructs.

Now, specifying "default(none|present)" on a data construct turns on same
default clause behavior for all lexically enclosed compute constructs (which
don't already themselves have a default clause).

gcc/c/ChangeLog:
* c-parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/cp/ChangeLog:
* parser.cc (OACC_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.

gcc/fortran/ChangeLog:
* openmp.cc (OACC_DATA_CLAUSES): Add OMP_CLAUSE_DEFAULT.

gcc/ChangeLog:
* gimplify.cc (oacc_region_type_name): New function.
(oacc_default_clause): If no 'default' clause appears on this
compute construct, see if one appears on a lexically containing
'data' construct.
(gimplify_scan_omp_clauses): Upon OMP_CLAUSE_DEFAULT case, set
ctx->oacc_default_clause_ctx to current context.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/default-3.c: Adjust testcase.
* c-c++-common/goacc/default-4.c: Adjust testcase.
* c-c++-common/goacc/default-5.c: Adjust testcase.
* gfortran.dg/goacc/default-3.f95: Adjust testcase.
* gfortran.dg/goacc/default-4.f: Adjust testcase.
* gfortran.dg/goacc/default-5.f: Adjust testcase.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>

commit | commitdiff | tree

Juzhe-Zhong [Sat, 12 Aug 2023 14:15:15 +0000 (22:15 +0800)]

RISC-V: Fix autovec_length_operand predicate[PR110989]

Currently, autovec_length_operand predicate incorrect configuration is
discovered in PR110989 since this following situation:

vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, POLY_INT_CST [2, 2], 0); ---> dummy length = VF.

The current autovec length operand failed to recognize the VF dummy length.

-march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=scalable -Ofast -fno-schedule-insns -fno-schedule-insns2:

Before this patch:

srli a4,s0,2
addi a4,a4,-3
srli s0,s0,3
vsetvli a5,zero,e64,m1,ta,ma
vid.v v1
vmul.vx v1,v1,a4
addi a4,s0,-2
vadd.vx v1,v1,a4
addi a4,s0,-1
vslide1up.vx v2,v1,a4
vmv.v.x v1,a4
vand.vv v1,v2,v1
vl1re64.v v3,0(t2)
vrgather.vv v2,v3,v1
vmv.v.i v1,0
vmfeq.vv v0,v2,v1
vsetvli zero,s0,e32,mf2,ta,ma ---> s0 = POLY (2,2)
vle32.v v3,0(t3),v0.t
vsetvli a5,zero,e64,m1,ta,ma
vmfne.vv v0,v2,v1
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.x.v v1,v3
vsetvli zero,zero,e64,m1,ta,ma
vmerge.vvm v1,v1,v2,v0
vslidedown.vx v1,v1,a4
vfmv.f.s fa5,v1
j .L6

After this patch:

srli a4,s0,2
addi a4,a4,-3
srli s0,s0,3
vsetvli a5,zero,e64,m1,ta,ma
vid.v v1
vmul.vx v1,v1,a4
addi a4,s0,-2
vadd.vx v1,v1,a4
addi s0,s0,-1
vslide1up.vx v2,v1,s0
vmv.v.x v1,s0
vand.vv v1,v2,v1
vl1re64.v v3,0(t2)
vrgather.vv v2,v3,v1
vmv.v.i v1,0
vmfeq.vv v0,v2,v1
vle32.v v3,0(t3),v0.t
vmfne.vv v0,v2,v1
vsetvli zero,zero,e32,mf2,ta,ma
vfwcvt.f.x.v v1,v3
vsetvli zero,zero,e64,m1,ta,ma
vmerge.vvm v1,v1,v2,v0
vslidedown.vx v1,v1,s0
vfmv.f.s fa5,v1
j .L6

2 vsetvli insns are reduced.

gcc/ChangeLog:

PR target/110989
* config/riscv/predicates.md: Fix predicate.

gcc/testsuite/ChangeLog:

PR target/110989
* gcc.target/riscv/rvv/autovec/pr110989.c: Add vsetvli assembly check.

commit | commitdiff | tree

Richard Biener [Tue, 15 Aug 2023 12:39:15 +0000 (14:39 +0200)]

Cleanup BB vectorization roots handling

The following moves CONSTRUCTOR handling into the generic BB
vectorization roots handling, removing a special case and finally
renaming the function now consisting of more than just constructor
detection.

* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
slp_inst_kind_ctor handling.
(vect_analyze_slp): Simplify.
(vect_build_slp_instance): Dump when we analyze a CTOR.
(vect_slp_check_for_constructors): Rename to ...
(vect_slp_check_for_roots): ... this. Register a
slp_root for CONSTRUCTORs instead of shoving them to
the set of grouped stores.
(vect_slp_analyze_bb_1): Adjust.

commit | commitdiff | tree

Richard Biener [Tue, 15 Aug 2023 11:05:32 +0000 (13:05 +0200)]

Support constants and externals in BB reduction vectorization

The following supports vectorizing BB reductions involving a
constant or an invariant.

* tree-vectorizer.h (_slp_instance::remain_stmts): Change
to ...
(_slp_instance::remain_defs): ... this.
(SLP_INSTANCE_REMAIN_STMTS): Rename to ...
(SLP_INSTANCE_REMAIN_DEFS): ... this.
(slp_root::remain): New.
(slp_root::slp_root): Adjust.
* tree-vect-slp.cc (vect_free_slp_instance): Adjust.
(vect_build_slp_instance): Get extra remain parameter,
adjust former handling of a cut off stmt.
(vect_analyze_slp_instance): Adjust.
(vect_analyze_slp): Likewise.
(_bb_vec_info::~_bb_vec_info): Likewise.
(vectorizable_bb_reduc_epilogue): Dump something if we fail.
(vect_slp_check_for_constructors): Handle non-internal
defs as remain defs of a reduction.
(vectorize_slp_instance_root_stmt): Adjust.

* gcc.dg/vect/bb-slp-75.c: New testcase.

commit | commitdiff | tree

Richard Biener [Tue, 15 Aug 2023 08:31:07 +0000 (10:31 +0200)]

Use find_loop_location from unrolling

The following uses the common find_loop_location as implemented
by the vectorizer to query a loop location also for unrolling.
That results in a more consistent reporting of locations.

* tree-ssa-loop-ivcanon.cc: Include tree-vectorizer.h
(canonicalize_loop_induction_variables): Use find_loop_location.

commit | commitdiff | tree

Hans-Peter Nilsson [Tue, 15 Aug 2023 04:35:43 +0000 (06:35 +0200)]

CRIS: Don't include tree.h in cris-protos.h, PR bootstrap/111021

While there's another patch that fixes the immediate error
in the PR by other means, the include of tree.h here is
something I prefer to avoid.

PR bootstrap/111021
* config/cris/cris-protos.h: Revert recent change.
* config/cris/cris.cc (cris_legitimate_address_p): Remove
code_helper unused parameter.
(cris_legitimate_address_p_hook): New wrapper function.
(TARGET_LEGITIMATE_ADDRESS_P): Change to
cris_legitimate_address_p_hook.

commit | commitdiff | tree

Richard Biener [Thu, 10 Aug 2023 11:55:36 +0000 (13:55 +0200)]

tree-optimization/110963 - more PRE when optimizing for size

The following adjusts the heuristic when we perform PHI insertion
during GIMPLE PRE from requiring at least one edge that is supposed
to be optimized for speed to also doing insertion when the expression
is available on all edges (but possibly with different value) and
we'd at most have one copy from a constant.  The first ensures
we optimize two computations on all paths to one plus a possible
copy due to the PHI, the second makes sure we do not need to insert
many possibly large copies from constants, disregarding the
cummulative size cost of the register copies when they are not
coalesced.

The case in the testcase is

  <bb 5>
  _14 = h;
  if (_14 == 0B)
    goto <bb 7>;
  else
    goto <bb 6>;

  <bb 6>
  h = 0B;

  <bb 7>
  h.6_12 = h;

and we want to optimize that to

  <bb 7>
  # h.6_12 = PHI <_14(5), 0B(6)>

If we want to consider the cost of the register copies I think the
only simplistic enough way would be to restrict the special-case to
two incoming edges - we'd assume one register copy is coalesced
leaving one copy from a register or from a constant.

As with every optimization the downstream effects are probably
bigger than what we can locally estimate.

PR tree-optimization/110963
* tree-ssa-pre.cc (do_pre_regular_insertion): Also insert
a PHI node when the expression is available on all edges
and we insert at most one copy from a constant.

* gcc.dg/tree-ssa/ssa-pre-34.c: New testcase.

commit | commitdiff | tree

Richard Biener [Mon, 14 Aug 2023 07:31:18 +0000 (09:31 +0200)]

tree-optimization/110991 - unroll size estimate after vectorization

The following testcase shows that we are bad at identifying inductions
that will be optimized away after vectorizing them because SCEV doesn't
handle vectorized defs. The following rolls a simpler identification
of SSA cycles covering a PHI and an assignment with a binary operator
with a constant second operand.

PR tree-optimization/110991
* tree-ssa-loop-ivcanon.cc (constant_after_peeling): Handle
VIEW_CONVERT_EXPR <op>, handle more simple IV-like SSA cycles
that will end up constant.

* gcc.dg/tree-ssa/cunroll-16.c: New testcase.

commit | commitdiff | tree

Kewen Lin [Tue, 15 Aug 2023 08:01:20 +0000 (03:01 -0500)]

Makefile.in: Make recog.h depend on $(TREE_H) [PR111021]

Commit r14-3093 introduced a random build failure on
build/gencondmd.cc building. Since r14-3093 make recog.h
include tree.h, which further includes (depends on) some
files that are generated during the building, such as:
all-tree.def, tree-check.h etc, when building file
build/gencondmd.cc, the build can fail if these dependences
are not ready. So this patch is to teach this dependence.

Thank Jan-Benedict Glaw for testing this!

PR bootstrap/111021

gcc/ChangeLog:

* Makefile.in (RECOG_H): Add $(TREE_H) as dependence.

commit | commitdiff | tree

Kewen Lin [Tue, 15 Aug 2023 06:36:33 +0000 (01:36 -0500)]

vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest

Following Richi's suggestion [1], this patch is to move the
handlings on VMAT_LOAD_STORE_LANES in the final loop nest
of function vectorizable_load to its own loop. Basically
it duplicates the final loop nest, clean up some useless
set up code for the case of VMAT_LOAD_STORE_LANES, remove
some unreachable code. Also remove the corresponding
handlings in the final loop nest.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623329.html

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
VMAT_LOAD_STORE_LANES in the final loop nest to its own loop,
and update the final nest accordingly.

commit | commitdiff | tree

Kewen Lin [Tue, 15 Aug 2023 06:36:23 +0000 (01:36 -0500)]

vect: Remove several useless VMAT_INVARIANT checks

In function vectorizable_load, there is one hunk which is
dedicated for the handlings on VMAT_INVARIANT and return
early, it means we shouldn't encounter any cases with
memory_access_type VMAT_INVARIANT in the following code
after that. This patch is to clean up several useless
checks on VMAT_INVARIANT. There should be no functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Remove some useless checks
on VMAT_INVARIANT.

commit | commitdiff | tree

Pan Li [Sun, 13 Aug 2023 00:56:21 +0000 (08:56 +0800)]

Mode-Switching: Fix SET_SRC ICE for create_pre_exit

In same cases, like gcc/testsuite/gcc.dg/pr78148.c in RISC-V, there will
be only 1 operand when SET_SRC in create_pre_exit. For example as below.

(insn 13 9 14 2 (clobber (reg/i:TI 10 a0)) "gcc/testsuite/gcc.dg/pr78148.c":24:1 -1
(expr_list:REG_UNUSED (reg/i:TI 10 a0)
(nil)))

Unfortunately, SET_SRC requires at least 2 operands and then Segment
Fault here. For SH4 part result in Segment Fault, it looks like only
valid when the return_copy_pat is load or something like that. Thus,
this patch try to fix it by restrict the SET insn for SET_SRC.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* mode-switching.cc (create_pre_exit): Add SET insn check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mode-switch-ice-1.c: New test.

Mirror of https://gcc.gnu.org/git/gcc.git

RSS Atom