git.ipfire.org Git - thirdparty/gcc.git/log

diagnostics: Delete config pointer before overwriting it

Delete m_client_data_hooks before it is reassigned in
tree_diagnostics_defaults. This fixes a small memory leak in the fortran
frontend, which restores the diagnostics configurations to their default
values with a call to tree_diagnostics_defaults at the end of the main parse
hook.

gcc/ChangeLog:

* tree-diagnostic.cc (tree_diagnostics_defaults): Delete allocated
pointer before overwriting it.

LoongArch: Implement 128-bit floating point functions in gcc.

During implementation, float128_type_node is bound with the type "__float128"
so that the compiler can correctly identify the type of the function. The
"q" suffix is associated with the "f128" function, which makes GCC more
flexible to support different user input cases, implementing functions such
as __builtin_{huge_valq, infq, fabsq, copysignq, nanq, nansq}.

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc (loongarch_init_builtins):
Associate the __float128 type to float128_type_node so that it can
be recognized by the compiler.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins):
Add the flag "FLOAT128_TYPE" to gcc and associate a function
with the suffix "q" to "f128".
* doc/extend.texi:Added support for 128-bit floating-point functions on
the LoongArch architecture.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/math-float-128.c: New test.

Daily bump.

Fortran: runtime bounds-checking in presence of array constructors [PR31059]

gcc/fortran/ChangeLog:

PR fortran/31059
* trans-array.cc (gfc_conv_ss_startstride): For array bounds checking,
consider also array constructors in expressions, and use their shape.

gcc/testsuite/ChangeLog:

PR fortran/31059
* gfortran.dg/bounds_check_fail_5.f90: New test.

analyzer: Add support of placement new and improved operator new [PR105948,PR94355]

Fixed spurious possibly-NULL warning always tagging along throwing
operator new despite it never returning NULL.
Now operator new is correctly recognized as possibly returning NULL
if and only if it is non-throwing or exceptions have been disabled.
Different standard signatures of operator new are now properly
recognized.

Added support of placement new, so that it is now properly recognized,
and a 'heap_allocated' region is no longer created for it.
Placement new size is also checked and a 'Wanalyzer-allocation-size'
is emitted when relevant, as well as always a 'Wanalyzer-out-of-bounds'.

'operator new' non-throwing variants are detected y checking the types
of the parameters.
Indeed, in a call to new (std::nothrow) () the chosen overload
has signature 'operator new (void*, std::nothrow_t&)', where the second
parameter is a reference. In a placement new, the second parameter will
always be a void pointer.

Prior to this patch, some buffers first allocated with 'new', then deleted
an thereafter used would result in a 'Wanalyzer-user-after-free'
warning. However the wording was "use after 'free'" instead of the
expected "use after 'delete'".
This patch fixes this by introducing a new kind of poisoned value,
namely POISON_KIND_DELETED.

Due to how the analyzer sees calls to non-throwing variants of
operator new, dereferencing a pointer freshly allocated in this fashion
caused both a 'Wanalyzer-use-of-uninitialized-value' and a
'Wanalyzer-null-dereference' to be emitted, while only the latter was
relevant. As a result, 'null-dereference' now supersedes
'use-of-uninitialized'.

Signed-off-by: benjamin priour <vultkayn@gcc.gnu.org>
gcc/analyzer/ChangeLog:

PR analyzer/105948
PR analyzer/94355
* analyzer.h (is_placement_new_p): New declaration.
* call-details.cc
(call_details::deref_ptr_arg): New function.
Dereference the argument at given index if possible.
* call-details.h: Declaration of the above function.
* kf-lang-cp.cc (is_placement_new_p): Returns true if the gcall
is recognized as a placement new.
(kf_operator_delete::impl_call_post): Unbinding a region and its
descendents now poisons with POISON_KIND_DELETED.
(register_known_functions_lang_cp): Known function "operator
delete" is now registered only once independently of its number of
arguments.
* region-model.cc (region_model::eval_condition): Now
recursively calls itself if any of the operand is wrapped in a
cast.
* sm-malloc.cc (malloc_state_machine::on_stmt):
Add placement new recognition.
* svalue.cc (poison_kind_to_str): Wording for the new PK.
* svalue.h (enum poison_kind): Add value POISON_KIND_DELETED.

gcc/testsuite/ChangeLog:

PR analyzer/105948
PR analyzer/94355
* g++.dg/analyzer/out-of-bounds-placement-new.C: Added a directive.
* g++.dg/analyzer/placement-new.C: Added tests.
* g++.dg/analyzer/new-2.C: New test.
* g++.dg/analyzer/noexcept-new.C: New test.
* g++.dg/analyzer/placement-new-size.C: New test.

testsuite: Fix analyzer_cpython_plugin.c declarations, PR testsuite/111264

Also, add missing newline at end of file.

PR testsuite/111264
* gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
C++11-compatible.

libstdc++: Fix debug-mode tests for constexpr algorithms

These tests started failing at some point:
FAIL: 25_algorithms/copy/debug/constexpr_neg.cc (test for errors, line 49)
FAIL: 25_algorithms/copy/debug/constexpr_neg.cc (test for excess errors)
FAIL: 25_algorithms/equal/debug/constexpr_neg.cc (test for errors, line 47)
FAIL: 25_algorithms/equal/debug/constexpr_neg.cc (test for excess errors)

They only run with -D_GLIBCXX_DEBUG or make check-debug so seem to have
gone unnoticed until now.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: Adjust
expected errors.
* testsuite/25_algorithms/equal/debug/constexpr_neg.cc:
Likewise.

libstdc++: Add -Wno-self-move to two filesystem tests

libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/iterators/91067.cc: Add
-Wno-self-move to options.
* testsuite/27_io/filesystem/path/assign/copy.cc: Likewise.

c++: Move new test to 'opt' sub-directory

gcc/testsuite/ChangeLog:

* g++.dg/pr110879.C: Moved to...
* g++.dg/opt/pr110879.C: ...here.

libstdc++: fix memory clobbering in std::vector [PR110879]

Fix ordering to prevent clobbering of class members by a call to deallocate
in _M_realloc_insert and _M_default_append.

Because of recent changes in _M_realloc_insert and _M_default_append,
calls to deallocate were ordered after assignment to class members of
std::vector (in the guard destructor), which is causing said members to
be call-clobbered. This is preventing further optimization, the
compiler is unable to move memory read out of a hot loop in this case.

This patch reorders the call to before assignments by putting guard in
its own block. Plus a new testsuite for this case. I'm not very happy
with the new testsuite, but I don't know how to properly test this.

PR libstdc++/110879

libstdc++-v3/ChangeLog:

* include/bits/vector.tcc (_M_realloc_insert): End guard
lifetime just before assignment to class members.
(_M_default_append): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/pr110879.C: New test.

Signed-off-by: Vladimir Palevich <palevichva@gmail.com>

libstdc++: Use std::string::__resize_and_overwrite in std::filesystem

There are a few places in the std::filesystem code that use a string as
a buffer for OS APIs to write to. We can use the new extension
__resize_and_overwrite to avoid redundant initialization of those
buffers.

libstdc++-v3/ChangeLog:

* src/c++17/fs_ops.cc (fs::absolute) [FILESYSTEM_IS_WINDOWS]:
Use __resize_and_overwrite to fill buffer.
(fs::read_symlink) [HAVE_READLINK]: Likewise.
* src/filesystem/ops-common.h (get_temp_directory_from_env)
[FILESYSTEM_IS_WINDOWS]: Likewise.

libstdc++: Use a loop in atomic_ref::compare_exchange_strong [PR111077]

We need to use a loop in std::atomic_ref::compare_exchange_strong in
order to properly implement the C++20 requirement that padding bits do
not participate when checking the value for equality. The variable being
modified by a std::atomic_ref might have an initial value with non-zero
padding bits, so when the __atomic_compare_exchange built-in returns
false we need to check whether that was only because of non-equal
padding bits that are not part of the value representation. If the value
bits differ, it's just a failed compare-exchange. If the value bits are
the same, we need to retry the __atomic_compare_exchange using the value
that was just read by the previous failed call. As noted in the
comments, it's possible for that second try to also fail due to another
thread storing the same value but with differences in padding.

Because it's undefined to access a variable directly while it's held by
a std::atomic_ref, and because std::atomic_ref will only ever store
values with zeroed padding, we know that padding bits will never go from
zero to non-zero during the lifetime of a std::atomic_ref. They can only
go from an initial non-zero state to zero. This means the loop will
terminate, rather than looping indefinitely as padding bits flicker on
and off. In theory users could call __atomic_store etc. directly and
write a value with non-zero padding bits, but we don't need to support
that. Users doing that should ensure they do not write non-zero padding,
to be compatibile with our std::atomic_ref's invariants.

This isn't a problem for std::atomic<T>::compare_exchange_strong because
the initial value (and all later stores to the variable) are performed
by the library, so we ensure that stored values always have padding bits
cleared. That means we can simply clear the padding bits of the
'expected' value and we will be comparing two values with equal padding
bits. This means we don't need the loop for std::atomic, so update the
__atomic_impl::__compare_exchange function to take a bool parameter that
says whether it's being used by std::atomic_ref. If not, we can use a
simpler, non-looping implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/111077
* include/bits/atomic_base.h (__atomic_impl::__compare_exchange):
Add _AtomicRef non-type template parameter and use a loop if it
is true.
(__atomic_impl::compare_exchange_weak): Add _AtomicRef NTTP.
(__atomic_impl::compare_exchange_strong): Likewise.
(atomic_ref::compare_exchange_weak): Use true for NTTP.
(atomic_ref::compare_exchange_strong): Use true for NTTP.
* testsuite/29_atomics/atomic_ref/compare_exchange_padding.cc:
Fix test to not rely on atomic_ref::load() to return an object
with padding preserved.

c++: Fix up mangling of function/block scope static structured bindings [PR111069]

As can be seen on the testcase, we weren't correctly mangling
static/thread_local structured bindings (C++20 feature) at function/block
scope.  The following patch fixes that by using what write_local_name
does for those cases (note, structured binding mandling doesn't use the
standard path because it needs to pass a list of all the identifiers in
the structured binding to the mangling).  In addition to that it fixes
mangling of various helpers which use write_guarded_name (_ZGV*, _ZTH*,
_ZTW*) and kills find_decomp_unqualified_name which for the local names
would be too hard to implement and uses write_guarded_name for structured
binding related _ZGR* names as well.

All the mangled names on the first testcase match now clang++ and my
expectations.
Because the old mangled names were plain wrong (they mangled the same as
structured binding at global scope and resulted in assembly errors if there
was more than one static structured binding with the same identifiers in
the same (or another) function, I think we don't need to play with another
mangling ABI level which turns on/off the old broken way.

In addition to that the patch starts to emit abi-tags into the mangle_decomp
produced names when needed and emits a -Wabi warning for that as well.
To make that work, I had to move cp_maybe_mangle_decomp calls from before
cp_finish_decl into a middle of cp_finish_decl after type is deduced and
maybe_commonize_var (which also had to be changed not to ignore structured
bindings) is called but before anything might need a mangled name for the
decl, so a new cp_decomp structure is passed to cp_finish_decl; various
other structured binding related functions have been changed to pass
pointer to that around instead of passing a tree and unsigned int separately.

On decomp9.C, there is a
_ZZ3barI1TB3quxEivEDC1o1pEB3qux
(g++) vs.
_ZZ3barI1TB3quxEivEDC1o1pE
(clang++) mangling difference, but that seems to be a clang++ bug and happens
also with normal static block vars, doesn't need structured bindings.

2023-09-01  Jakub Jelinek  <jakub@redhat.com>

PR c++/111069
gcc/
* common.opt (fabi-version=): Document version 19.
* doc/invoke.texi (-fabi-version=): Likewise.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
gcc/cp/
* cp-tree.h (determine_local_discriminator): Add NAME argument with
NULL_TREE default.
(struct cp_decomp): New type.
(cp_finish_decl): Add DECOMP argument defaulted to nullptr.
(cp_maybe_mangle_decomp): Remove declaration.
(cp_finish_decomp): Add cp_decomp * argument, remove tree and unsigned
args.
(cp_convert_range_for): Likewise.
* decl.cc (determine_local_discriminator): Add NAME argument, use it
if non-NULL, otherwise compute it the old way.
(maybe_commonize_var): Don't return early for structured bindings.
(cp_finish_decl): Add DECOMP argument, if non-NULL, call
cp_maybe_mangle_decomp.
(cp_maybe_mangle_decomp): Make it static with a forward declaration.
Call determine_local_discriminator.  Replace FIRST and COUNT arguments
with DECOMP argument.
(cp_finish_decomp): Replace FIRST and COUNT arguments with DECOMP
argument.
* mangle.cc (find_decomp_unqualified_name): Remove.
(write_unqualified_name): Don't call find_decomp_unqualified_name.
(mangle_decomp): Handle mangling of static function/block scope
structured bindings.  Don't call decl_mangling_context twice.  Call
check_abi_tags, call write_abi_tags for abi version >= 19 and emit
-Wabi warnings if needed.
(write_guarded_var_name): Handle structured bindings.
(mangle_ref_init_variable): Use write_guarded_var_name.
* parser.cc (cp_parser_range_for): Adjust do_range_for_auto_deduction
and cp_convert_range_for callers.
(do_range_for_auto_deduction): Replace DECOMP_FIRST_NAME and
DECOMP_CNT arguments with DECOMP.  Adjust cp_finish_decomp caller.
(cp_convert_range_for): Replace DECOMP_FIRST_NAME and
DECOMP_CNT arguments with DECOMP.  Don't call cp_maybe_mangle_decomp,
adjust cp_finish_decl and cp_finish_decomp callers.
(cp_parser_decomposition_declaration): Don't call
cp_maybe_mangle_decomp, adjust cp_finish_decl and cp_finish_decomp
callers.
(cp_convert_omp_range_for): Adjust do_range_for_auto_deduction
and cp_finish_decomp callers.
(cp_finish_omp_range_for): Don't call cp_maybe_mangle_decomp,
adjust cp_finish_decl and cp_finish_decomp callers.
* pt.cc (tsubst_omp_for_iterator): Adjust tsubst_decomp_names
caller.
(tsubst_decomp_names): Replace FIRST and CNT arguments with DECOMP.
(tsubst_expr): Don't call cp_maybe_mangle_decomp, adjust
tsubst_decomp_names, cp_finish_decl, cp_finish_decomp and
cp_convert_range_for callers.
gcc/testsuite/
* g++.dg/cpp2a/decomp8.C: New test.
* g++.dg/cpp2a/decomp9.C: New test.
* g++.dg/abi/macro0.C: Expect __GXX_ABI_VERSION 1019 rather than
1018.

testsuite: Fix vectcond-1.C FAIL on i686-linux [PR19832]

This test FAILs on i686-linux with
.../gcc/testsuite/g++.dg/opt/vectcond-1.C:8:57: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
.../gcc/testsuite/g++.dg/opt/vectcond-1.C:17:12: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
excess warning. Fixed by using -Wno-psabi.

2023-09-01 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/19832
* g++.dg/opt/vectcond-1.C: Add -Wno-psabi to dg-options.

testsuite: Fix up pr110915* tests on i686-linux [PR110915]

These tests FAIL on i686-linux, with
.../gcc/testsuite/gcc.dg/pr110915-1.c:8:1: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
.../gcc/testsuite/gcc.dg/pr110915-1.c:7:15: warning: MMX vector argument without MMX enabled changes the ABI [-Wpsabi]
excess warnings.  I've added -Wno-psabi to quiet that up, plus I think
it is undesirable to define macros like vector before including C library
headers in case the header would use that identifier in non-obfuscated
form somewhere.

2023-09-01  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/110915
* gcc.dg/pr110915-1.c: Add -Wno-psabi to dg-options.  Move vector
macro definition after limits.h inclusion.
* gcc.dg/pr110915-2.c: Likewise.
* gcc.dg/pr110915-3.c: Likewise.
* gcc.dg/pr110915-4.c: Likewise.
* gcc.dg/pr110915-5.c: Likewise.
* gcc.dg/pr110915-6.c: Likewise.
* gcc.dg/pr110915-7.c: Likewise.
* gcc.dg/pr110915-8.c: Likewise.
* gcc.dg/pr110915-9.c: Likewise.
* gcc.dg/pr110915-10.c: Likewise.
* gcc.dg/pr110915-11.c: Likewise.
* gcc.dg/pr110915-12.c: Likewise.

RISC-V: Add conditional autovec convert(INT<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><mode><vconvert>):
New combine pattern.
(*cond_<float_cvt><vconvert><mode>): Ditto.
(*cond_<optab><vnconvert><mode>): Ditto.
(*cond_<float_cvt><vnconvert><mode>): Ditto.
(*cond_<optab><mode><vnconvert>): Ditto.
(*cond_<float_cvt><mode><vnconvert>2): Ditto.
* config/riscv/autovec.md (<optab><mode><vconvert>2): Adjust.
(<float_cvt><vconvert><mode>2): Adjust.
(<optab><vnconvert><mode>2): Adjust.
(<float_cvt><vnconvert><mode>2): Adjust.
(<optab><mode><vnconvert>2): Adjust.
(<float_cvt><mode><vnconvert>2): Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add INT->FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New test.

RISC-V: Add conditional autovec convert(FP<->FP) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_extend<v_double_trunc><mode>):
New combine pattern.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: New test.

RISC-V: Add conditional autovec convert(INT<->INT) patterns

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_<optab><v_double_trunc><mode>):
New combine pattern.
(*cond_<optab><v_quad_trunc><mode>): Ditto.
(*cond_<optab><v_oct_trunc><mode>): Ditto.
(*cond_trunc<mode><v_double_trunc>): Ditto.
* config/riscv/autovec.md (<optab><v_quad_trunc><mode>2): Adjust.
(<optab><v_oct_trunc><mode>2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New test.

RISC-V: Adjust expand_cond_len_{unary,binop,op} api

This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Ditto.
(expand_cond_len_op): Ditto.
(expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
(expand_cond_len_ternop): Ditto.

libstdc++: Use dg-require-filesystem-ts in link test

This test expects to be able to link, which fails if there are undefined
references to chdir, mkdir etc. in fs_ops.o in the libstdc++.a archive.

libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/path/108636.cc: Add dg-require for
filesystem support.

libstdc++: Avoid useless dependency on read_symlink from tzdb

chrono::tzdb::current_zone uses filesystem::read_symlink, which creates
a dependency on the fs_ops.o object in libstdc++.a, which then creates
dependencies on several OS functions if --gc-sections isn't used. For
more details see PR libstdc++/104167 comment 8 and comment 11.

In the cases where that causes linker failures, we probably don't have
readlink anyway, so the filesystem::read_symlink call will always fail.
Repeat the preprocessor conditions for filesystem::read_symlink in the
body of chrono::tzdb::current_zone so that we don't create a
dependency on fs_ops.o for a function that will always fail.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (tzdb::current_zone): Check configure macros
for POSIX readlink before using filesystem::read_symlink.

libstdc++: Make --enable-libstdcxx-backtrace=auto default to yes

This causes libstdc++_libbacktrace.a to be built by default. This might
fail on some targets, in which case we can make the 'auto' choice expand
to either 'yes' or 'no' depending on the target.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_BACKTRACE): Default to yes.
* configure: Regenerate.

RISC-V: Enable VECT_COMPARE_COSTS by default

since we have added COST framework, we by default enable VECT_COMPARE_COSTS.

Also, add 16/32/64 to provide more choices for COST comparison.

This patch doesn't change any behavior from the current testsuite since we are using
default COST model.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (autovectorize_vector_modes): Enable
VECT_COMPARE_COSTS by default.

RISC-V: Add vec_extract for BI -> QI.

This patch adds a vec_extract expander that extracts a QImode from a
vector mask mode.  In doing so, it helps recognize a "live
operation"/extract last idiom for mask modes.  It fixes the ICE in
tree-vect-live-6.c by circumventing the fallback code in
extract_bit_field_1.  The problem there is still latent, though, and
needs to be addressed separately.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract<mode>qi): New expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-2.c: New test.

testsuite/vect: Make match patterns more accurate.

On some targets we fail to vectorize with the first type the vectorizer
tries but succeed with the second. This patch changes several regex
patterns to reflect that behavior.

Before we would look for a single occurrence of e.g.
"vect_recog_dot_prod_pattern" but would possible have two (one for each
attempted mode). The new pattern tries to match sequences where we
first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
while making sure there is no "failed" or "Re-trying" in between.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-outer-4c-big-array.c: Adjust regex pattern.
* gcc.dg/vect/vect-reduc-dot-s16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.

RISC-V: Add dynamic LMUL compile option

We are going to support dynamic LMUL support.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Add
dynamic enum.
* config/riscv/riscv.opt: Add dynamic compile option.

libstdc++: Fix how chrono::parse handles errors for time-of-day values

We fail to diagnose an error and extract an incorrect time for cases
like "25:59" >> parse("%H:%M", mins). The bad "25" hour value gets
ignored (on the basis that we might not care about it if trying to
extract something like a weekday or a month name), but then when we get
to the end of the function we think we have a valid time from "59" and
so the result is 00:59.

The problem is that the '__bad_h' value is used for "no hour value read
yet" as well as "bad hour value read". If we just set __h = __bad_h and
continue, we can't tell later that we read an invalid hour.

The fix is to set failbit early when we're trying to extract a
time-of-day (e.g. duration or time_point) and we encounter an invalid
hour, minute, or second value. We can still delay other error checking
to the end.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (_Parser::operator()): Set failbit
early if invalid values are read when _M_need & _TimeOfDay is
non-zero.
* testsuite/std/time/parse.cc: Check that "25:59" cannot be
parsed for "%H:%M".

libstdc++: Do not allow chrono::parse to overflow for %C [PR111162]

libstdc++-v3/ChangeLog:

PR libstdc++/111162
* include/bits/chrono_io.h (_Parser::Operator()): Check %C
values are in range of year::min() to year::max().
* testsuite/std/time/parse.cc: Check out of range centuries.

libstdc++: Simplify __format::_Sink::_M_reset

Using an offset as the second argument instead of an iterator makes it
easier for callers, as they don't need to create an lvalue span in order
to get an iterator from it for the _M_reset call.

libstdc++-v3/ChangeLog:

* include/std/format (__format::_Sink::_M_reset): Change second
argument from iterator to offset.

RISC-V: Support FP ADD/SUB/MUL/DIV autovec for VLS mode

This patch would like to allow the VLS mode autovec for the
floating-point binary operation ADD/SUB/MUL/DIV.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
    out[i] = in1[i] + in2[i];
}

Before this patch:
test:
  csrr a4,vlenb
  slli a4,a4,1
  li   a5,128
  bleu a5,a4,.L38
  mv   a5,a4
.L38:
  vsetvli  zero,a5,e32,m8,ta,ma
  vle32.v  v16,0(a1)
  vsetvli  a4,zero,e32,m8,ta,ma
  vmv.v.i  v8,0
  vsetvli  zero,a5,e32,m8,tu,ma
  vle32.v  v24,0(a2)
  vfadd.vv v8,v24,v16
  vse32.v  v8,0(a0)
  ret

After this patch:
test:
  li       a5,128
  vsetvli  zero,a5,e32,m1,ta,ma
  vle32.v  v1,0(a2)
  vle32.v  v2,0(a1)
  vfadd.vv v1,v1,v2
  vse32.v  v1,0(a0)
  ret

Please note this patch also fix the execution failure of below
vect test cases.

* vect-alias-check-10.c
* vect-alias-check-11.c
* vect-alias-check-12.c
* vect-alias-check-14.c

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-vls.md (<optab><mode>3): New pattern for
vls floating-point autovec.
* config/riscv/vector-iterators.md: New iterator for
floating-point V and VLS.
* config/riscv/vector.md: Add VLS to floating-point binop.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h:
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: New test.

MATCH [PR19832]: Optimize some `(a != b) ? a OP b : c`

This patch adds the following match patterns to optimize these:
/* (a != b) ? (a - b) : 0 -> (a - b) */
/* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
/* (a != b) ? (a & b) : a -> (a & b) */
/* (a != b) ? (a | b) : a -> (a | b) */
/* (a != b) ? min(a,b) : a -> min(a,b) */
/* (a != b) ? max(a,b) : a -> max(a,b) */
/* (a != b) ? (a * b) : (a * a) -> (a * b) */
/* (a != b) ? (a + b) : (a + a) -> (a + b) */
/* (a != b) ? (a + b) : (2 * a) -> (a + b) */
Note currently only integer types (include vector types)
are handled. Floating point types can be added later on.

OK? Bootstrapped and tested on x86_64-linux-gnu.

The first pattern had still shows up in GCC in cse.c's preferable
function which was the original motivation for this patch.

PR tree-optimization/19832

gcc/ChangeLog:

* match.pd: Add pattern to optimize
`(a != b) ? a OP b : c`.

gcc/testsuite/ChangeLog:

* g++.dg/opt/vectcond-1.C: New test.
* gcc.dg/tree-ssa/phi-opt-same-1.c: New test.

LoongArch: Fix bug in loongarch_emit_stack_tie [PR110484].

Which may result in implicit references to $fp when frame_pointer_needed is false,
causing regs_ever_live[$fp] to be true when $fp is not explicitly used,
resulting in $fp being used as the target replacement register in the rnreg pass.

The bug originates from SPEC2017 541.leela_r(-flto).

gcc/ChangeLog:

PR target/110484
* config/loongarch/loongarch.cc (loongarch_emit_stack_tie): Use the
frame_pointer_needed to determine whether to use the $fp register.

Co-authored-by: Guo Jie <guojie@loongson.cn>

Daily bump.

MATCH: extend min_value/max_value match to vectors

This simple patch extends the min_value/max_value match to vector integer types.
Using uniform_integer_cst_p makes this easy.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

The testcases pr110915-*.c are the same as pr88784-*.c except using vector
types instead.

PR tree-optimization/110915

gcc/ChangeLog:

* match.pd (min_value, max_value): Extend to vector constants.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110915-1.c: New test.
* gcc.dg/pr110915-10.c: New test.
* gcc.dg/pr110915-11.c: New test.
* gcc.dg/pr110915-12.c: New test.
* gcc.dg/pr110915-2.c: New test.
* gcc.dg/pr110915-3.c: New test.
* gcc.dg/pr110915-4.c: New test.
* gcc.dg/pr110915-5.c: New test.
* gcc.dg/pr110915-6.c: New test.
* gcc.dg/pr110915-7.c: New test.
* gcc.dg/pr110915-8.c: New test.
* gcc.dg/pr110915-9.c: New test.

Darwin: homogenize spelling of macOS

gcc/ChangeLog:
* config.in: Regenerate.
* config/darwin-c.cc: Change spelling to macOS.
* config/darwin-driver.cc: Likewise.
* config/darwin.h: Likewise.
* configure.ac: Likewise.
* doc/contrib.texi: Likewise.
* doc/extend.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/plugins.texi: Likewise.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Change spelling to macOS.
* plugin.cc: Likewise.

gcc/analyzer/ChangeLog:
* kf.cc: Change spelling to macOS.

gcc/c-family/ChangeLog:
* c.opt: Change spelling to macOS.

gcc/fortran/ChangeLog:
* gfortran.texi: Likewise.

gcc/jit/ChangeLog:
* jit-playback.cc: Change spelling to macOS.

gcc/objc/ChangeLog:
* objc-act.cc: Change spelling to macOS.

RISC-V: Support rounding mode for VFNMADD/VFNMACC autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmadd     <- autovec/autovec-opt

The autovec generated vfnmadd should take DYN mode, and the
frm must be restored before the vfnmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmadd     <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmadd/vfnmacc.
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c: New test.

RISC-V: Support rounding mode for VFNMSAC/VFNMSUB autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmsub     <- autovec/autovec-opt

The autovec generated vfnmsub should take DYN mode, and the
frm must be restored before the vfnmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmsub     <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmsac/vfnmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64: Fix return register handling in untyped_call

While working on another patch, I hit a problem with the aarch64
expansion of untyped_call.  The expander emits the usual:

  (set (mem ...) (reg resN))

instructions to store the result registers to memory, but it didn't
say in RTL where those resN results came from.  This eventually led
to a failure of gcc.dg/torture/stackalign/builtin-return-2.c,
via regrename.

This patch turns the untyped call from a plain call to a call_value,
to represent that the call returns (or might return) a useful value.
The patch also uses a PARALLEL return rtx to represent all the possible
return registers.

gcc/
* config/aarch64/aarch64.md (untyped_call): Emit a call_value
rather than a call.  List each possible destination register
in the call pattern.

rs6000: Update instruction counts to match vec_* calls [PR111228]

Commit  r14-3258-ge7a36e4715c716 increased the amount of folding we perform,
leading to better code.  Update the expected instruction counts to match the
changes.

2023-08-31  Peter Bergner  <bergner@linux.ibm.com>

gcc/testsuite/
PR testsuite/111228
* gcc.target/powerpc/fold-vec-logical-ors-char.c: Update instruction
counts to match the number of associated vec_* built-in calls.
* gcc.target/powerpc/fold-vec-logical-ors-int.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-ors-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-ors-short.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-char.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-int.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-other-short.c: Likewise.

RISC-V: Support rounding mode for VFMSAC/VFMSUB autovec

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfmsub      <- autovec/autovec-opt

The autovec generated vfmsub should take DYN mode, and the
frm must be restored before the vfmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmsub      <- autovec/autovec-opt
| ...
+------------

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmsac/vfmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c: New test.

RISC-V: Support rounding mode for VFMADD/VFMACC autovec

There will be a case like below for intrinsic and autovec combination

vfadd RTZ   <- intrinisc static rounding
vfmadd      <- autovec/autovec-opt

The autovec generated vfmadd should take DYN mode, and the
frm must be restored before the vfmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the vfmadd/vfmacc autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+------------
| frrm  a5
| ...
| fsrmi 4
| vfadd       <- intrinsic static rounding.
| ...
| fsrm  a5
| vfmadd      <- autovec/autovec-opt
| ...
+------------

However, we leverage unspec instead of use to consume the FRM register
because there are some restrictions from the combine pass. Some code
path of try_combine may require the XVECLEN(pat, 0) == 2 for the
recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
and result in the vfwmacc optimization failure. For example, in the
test  widen-complicate-5.c and widen-8.c

Finally, there will be other fma cases and they will be covered in
the underlying patches.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmadd/vfmacc.
* config/riscv/autovec.md: Ditto.
* config/riscv/vector-iterators.md: Add UNSPEC_VFFMA.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: New test.

middle-end/111253 - partly revert r11-6508-gabb1b6058c09a7

The following keeps dumping SSA def stmt RHS during diagnostic
reporting only for gimple_assign_single_p defs which means
memory loads.  This avoids diagnostics containing PHI nodes
like

  warning: 'realloc' called on pointer '*_42 = PHI <lcs.14_40(29), lcs.19_48(30)>.t_mem_caches' with nonzero offset 40

instead getting back the previous behavior:

  warning: 'realloc' called on pointer '*<unknown>.t_mem_caches' with nonzero offset 40

PR middle-end/111253
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::primary_expression):
Only dump gimple_assign_single_p SSA def RHS.

gcc/testsuite/
* gcc.dg/Wfree-nonheap-object-7.c: New testcase.

RISC-V: Add vector_scalar_shift_operand

The vector shift immediates happen to have the same constraints as some
of the CSR-related operands, but it's a different usage. This adds a
name for them, so I don't get confused again next time.

gcc/ChangeLog:

* config/riscv/autovec.md (shifts): Use
vector_scalar_shift_operand.
* config/riscv/predicates.md (vector_scalar_shift_operand): New
predicate.

RISC-V: Add Vector cost model framework for RVV

Hi, currently RVV vectorization only support picking LMUL according to
compile option --param=riscv-autovec-lmul= which is no ideal.

Compiler should be able to pick optimal LMUL/vectorization factor to
vectorize the loop according to the loop_vec_info and SSA-based register
pressure analysis.

Now, I figure out current GCC cost model provide the approach that we
can choose LMUL/vectorization factor by adjusting the COST.

This patch is just add the minimum COST model framework which is still
applying the default cost model (No vector codes changed from before).

Regression all pased and no difference.

gcc/ChangeLog:

* config.gcc: Add vector cost model framework for RVV.
* config/riscv/riscv.cc (riscv_vectorize_create_costs): Ditto.
(TARGET_VECTORIZE_CREATE_COSTS): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-vector-costs.cc: New file.
* config/riscv/riscv-vector-costs.h: New file.

rs6000: Don't allow AltiVec address in movoo & movxo pattern [PR110411]

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects. The solution is to modify the constraints used in the movoo and
movxo pattern to disallow these types of addresses, which assists LRA in
resolving this issue. Furthermore, the mode size 16 check has been
removed in vsx_quad_dform_memory_operand to allow OOmode and XOmode, and
quad_address_p already handles less than size 16.

2023-08-31 Jeevitha Palanisamy <jeevitha@linux.ibm.com>

gcc/
PR target/110411
* config/rs6000/mma.md (define_insn_and_split movoo): Disallow
AltiVec address operands.
(define_insn_and_split movxo): Likewise.
* config/rs6000/predicates.md (vsx_quad_dform_memory_operand): Remove
redundant mode size check.

gcc/testsuite/
PR target/110411
* gcc.target/powerpc/pr110411-1.c: New testcase.
* gcc.target/powerpc/pr110411-2.c: New testcase.

RISC-V: Change vsetvl tail and mask policy to default policy

This patch change the vsetvl policy to default policy
(returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
fixed policy. Any policy is now returned, allowing change to agnostic
or undisturbed. In the future, users may be able to control the default
policy, such as keeping agnostic by compiler options.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
* config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
Change to default policy.
* config/riscv/riscv-vector-builtins-bases.cc: Change to default policy.
* config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
* config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.

Fix gcc.dg/tree-ssa/forwprop-42.c

The testcase requires hardware support for V2DImode vectors because
otherwise we do not rewrite inserts via BIT_FIELD_REF to
BIT_INSERT_EXPR. There's no effective target for this so the
following makes the testcase x86 specific, requiring and enabling SSE2.

* gcc.dg/tree-ssa/forwprop-42.c: Move ...
* gcc.target/i386/pr111228.c: ... here. Enable SSE2.

RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functions

This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions.
These functions are used to generate RVV insn. There are currently 31
such functions and a few duplicates. The reason so many functions are
needed is because there are more types of RVV instructions. There are
patterns that don't have mask operand, patterns that don't have merge
operand, and patterns that don't need a tail policy operand, etc.

Previously there was the insn_type enum, but it's value was just used
to indicate how many operands were passed in by caller. The rest of
the operands information is scattered throughout these functions.
For example, emit_vlmax_fp_insn indicates that a rounding mode operand
of FRM_DYN should also be passed, emit_vlmax_merge_insn means that
there is no mask operand or mask policy operand.

I introduced a new enum insn_flags to indicate some properties of these
RVV patterns. These insn_flags are then used to define insn_type enum.
For example for the defintion of WIDEN_TERNARY_OP:

  WIDEN_TERNARY_OP = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
                       | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P | TERNARY_OP_P,

This flags mean the RVV pattern has no merge operand. This flags only apply
to vwmacc instructions. After defining the desired insn_type, all the
emit_{vlmax,nonvlmax}_xxx functions are unified into three functions:

  emit_vlmax_insn (icode, insn_flags, ops);
  emit_nonvlmax_insn (icode, insn_flags, ops, vl);
  emit_vlmax_insn_lra (icode, insn_flags, ops, vl);

Then user can select the appropriate insn_type and the appropriate emit_xxx
function for RVV patterns generation as needed.

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec-vls.md: Ditto.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add insn_type.
(enum insn_flags): Add insn flags.
(emit_vlmax_insn): Adjust.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_insn_lra): Add.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): New.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Add.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_nonvlmax_slide_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_insn): Delete.
(emit_nonvlmax_masked_insn): Delete.
(emit_vlmax_masked_store_insn): Delete.
(emit_nonvlmax_masked_store_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_vlmax_masked_fp_mu_insn): Delete.
(emit_nonvlmax_tu_insn): Delete.
(emit_nonvlmax_fp_tu_insn): Delete.
(emit_nonvlmax_tumu_insn): Delete.
(emit_nonvlmax_fp_tumu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_cpop_insn): Delete.
(emit_vlmax_integer_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_gather_insn): Delete.
(emit_vlmax_masked_gather_mu_insn): Delete.
(emit_vlmax_compress_insn): Delete.
(emit_nonvlmax_compress_insn): Delete.
(emit_vlmax_reduction_insn): Delete.
(emit_vlmax_fp_reduction_insn): Delete.
(emit_nonvlmax_fp_reduction_insn): Delete.
(expand_vec_series): Adjust.
(expand_const_vector): Adjust.
(legitimize_move): Adjust.
(sew64_scalar_helper): Adjust.
(expand_tuple_move): Adjust.
(expand_vector_init_insert_elems): Adjust.
(expand_vector_init_merge_repeating_sequence): Adjust.
(expand_vec_cmp): Adjust.
(expand_vec_cmp_float): Adjust.
(expand_vec_perm): Adjust.
(shuffle_merge_patterns): Adjust.
(shuffle_compress_patterns): Adjust.
(shuffle_decompress_patterns): Adjust.
(expand_load_store): Adjust.
(expand_cond_len_op): Adjust.
(expand_cond_len_unop): Adjust.
(expand_cond_len_binop): Adjust.
(expand_gather_scatter): Adjust.
(expand_cond_len_ternop): Adjust.
(expand_reduction): Adjust.
(expand_lanes_load_store): Adjust.
(expand_fold_extract_last): Adjust.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Adjust.
* config/riscv/vector.md: Adjust.

Adjust gcc.target/i386/pr52252-{atom,core}.c

The following adjusts the testcases to force 128bit vectorization
to make them more robust when for example adding -march=cascadelake

* gcc.target/i386/pr52252-atom.c: Add -mprefer-vector-width=128.
* gcc.target/i386/pr52252-core.c: Likewise.

rs6000: call vector load/store with length only on 64-bit Power10

gcc/
PR target/96762
* config/rs6000/rs6000-string.cc (expand_block_move): Call vector
load/store with length only on 64-bit Power10.

gcc/testsuite/
PR target/96762
* gcc.target/powerpc/pr96762.c: New.

arc: Honor SWAP option for lsl16 instruction

The LSL16 instruction is only available if SWAP (-mswap) option is
turned on.

gcc/ChangeLog:

* config/arc/arc.cc (arc_split_mov_const): Use LSL16 only when
SWAP option is enabled.
* config/arc/arc.md (ashlsi2_cnt16): Likewise.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

arm: Remove unsigned variant of vcaddq_m

The unsigned variants of the vcaddq_m operation are not needed within the
compiler, as the assembly output of the signed and unsigned versions of the
ops is identical: with a `.i` suffix (as opposed to separate `.s` and `.u`
suffixes).

Tested with baremetal arm-none-eabi on Arm's fastmodels.

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (vcaddq_rot90, vcaddq_rot270):
Use common insn for signed and unsigned front-end definitions.
* config/arm/arm_mve_builtins.def
(vcaddq_rot90_m_u, vcaddq_rot270_m_u): Make common.
(vcaddq_rot90_m_s, vcaddq_rot270_m_s): Remove.
* config/arm/iterators.md (mve_insn): Merge signed and unsigned defs.
(isu): Likewise.
(rot): Likewise.
(mve_rot): Likewise.
(supf): Likewise.
(VxCADDQ_M): Likewise.
* config/arm/unspecs.md (unspec): Likewise.
* config/arm/mve.md: Fix minor typo.

Refactor vector HF/BF mode iterators and patterns.

gcc/ChangeLog:

* config/i386/sse.md (<avx512>_blendm<mode>): Merge
VF_AVX512HFBFVL into VI12HFBF_AVX512VL.
(VF_AVX512HFBF16): Renamed to VHFBF.
(VF_AVX512FP16VL): Renamed to VHF_AVX512VL.
(VF_AVX512FP16): Removed.
(div<mode>3): Adjust VF_AVX512FP16VL to VHF_AVX512VL.
(avx512fp16_rcp<mode>2<mask_name>): Ditto.
(rsqrt<mode>2): Ditto.
(<sse>_rsqrt<mode>2<mask_name>): Ditto.
(vcond<mode><code>): Ditto.
(vcond<sseintvecmodelower><mode>): Ditto.
(<avx512>_fmaddc_<mode>_mask1<round_expand_name>): Ditto.
(<avx512>_fmaddc_<mode>_maskz<round_expand_name>): Ditto.
(<avx512>_fcmaddc_<mode>_mask1<round_expand_name>): Ditto.
(<avx512>_fcmaddc_<mode>_maskz<round_expand_name>): Ditto.
(cmla<conj_op><mode>4): Ditto.
(fma_<mode>_fadd_fmul): Ditto.
(fma_<mode>_fadd_fcmul): Ditto.
(fma_<complexopname>_<mode>_fma_zero): Ditto.
(fma_<mode>_fmaddc_bcst): Ditto.
(fma_<mode>_fcmaddc_bcst): Ditto.
(<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto.
(cmul<conj_op><mode>3): Ditto.
(<avx512>_<complexopname>_<mode><maskc_name><round_name>):
Ditto.
(vec_unpacks_lo_<mode>): Ditto.
(vec_unpacks_hi_<mode>): Ditto.
(vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto.
(vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto.
(*vec_extract<mode>_0): Ditto.
(*<avx512>_cmp<mode>3): Extend to V48H_AVX512VL.

RISC-V: Fix vsetvl pass ICE

This patch fix pr111234 (a vsetvl pass ICE) when fuse a mask any
vlmax vsetvl_vtype_change_only insn with a mu vsetvl insn.

PR target/111234

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Remove condition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr111234.c: New test.

Add overflow API for plus minus mult on range

In previous reviews, adding overflow APIs to range-op would be useful.
Those APIs could help to check if overflow happens when operating
between two 'range's, like: plus, minus, and mult.

Previous discussions are here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624067.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624701.html

gcc/ChangeLog:

* range-op-mixed.h (operator_plus::overflow_free_p): New declare.
(operator_minus::overflow_free_p): New declare.
(operator_mult::overflow_free_p): New declare.
* range-op.cc (range_op_handler::overflow_free_p): New function.
(range_operator::overflow_free_p): New default function.
(operator_plus::overflow_free_p): New function.
(operator_minus::overflow_free_p): New function.
(operator_mult::overflow_free_p): New function.
* range-op.h (range_op_handler::overflow_free_p): New declare.
(range_operator::overflow_free_p): New declare.
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

Daily bump.

analyzer: implement reference count checking for CPython plugin [PR107646]

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but
ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

This is a WIP in several ways:
- Currently, functions returning PyObject * are assumed to always
  produce a new reference.
- The validation of reference count is only for PyObjects created within
  a function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* engine.cc (impl_region_model_context::warn): New optional
parameter.
* exploded-graph.h (class impl_region_model_context): Likewise.
* region-model.cc (region_model::pop_frame): New callback
feature for region_model::pop_frame.
* region-model.h (struct append_regions_cb_data): Likewise.
(class region_model): Likewise.
(class region_model_context): New optional parameter.
(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference
count checking for PyObjects.
* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here
(and added more tests).
* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
* gcc.dg/plugin/cpython-plugin-test-no-Python-h.c: ...here (and
added more tests).
* gcc.dg/plugin/plugin.exp: New tests.
* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>

Analyzer: include algorithm header

gcc/analyzer/ChangeLog:

* region-model.cc: Define INCLUDE_ALGORITHM.

pru: Add cstore expansion patterns

Add cstore patterns for the two specific operations which can be
efficiently expanded using the UMIN instruction:
X != 0
X == 0
The rest of the operations are rejected, and left to be expanded
by the common expansion code.

PR target/106562

gcc/ChangeLog:

* config/pru/predicates.md (const_0_operand): New predicate.
(pru_cstore_comparison_operator): Ditto.
* config/pru/pru.md (cstore<mode>4): New pattern.
(cstoredi4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/pru/pr106562-10.c: New test.
* gcc.target/pru/pr106562-11.c: New test.
* gcc.target/pru/pr106562-5.c: New test.
* gcc.target/pru/pr106562-6.c: New test.
* gcc.target/pru/pr106562-7.c: New test.
* gcc.target/pru/pr106562-8.c: New test.
* gcc.target/pru/pr106562-9.c: New test.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

c++: CWG 2359, wrong copy-init with designated init [PR91319]

This CWG clarifies that designated initializer support direct-initialization.
Just be careful what Note 2 in [dcl.init.aggr]/4.2 says: "If the
initialization is by designated-initializer-clause, its form determines
whether copy-initialization or direct-initialization is performed." Hence
this patch sets CONSTRUCTOR_IS_DIRECT_INIT only when we are dealing with
".x{}", but not ".x = {}".

PR c++/91319

gcc/cp/ChangeLog:

* parser.cc (cp_parser_initializer_list): Set CONSTRUCTOR_IS_DIRECT_INIT
when the designated initializer is of the .x{} form.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig30.C: New test.

c++: disallow constinit on functions [PR111173]

[dcl.constinit]/1: The constinit specifier shall be applied only to a declaration
of a variable with static or thread storage duration.

and while we detect

  constinit int fn();

we weren't detecting

  using F = int();
  constinit F f;

PR c++/111173

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Disallow constinit on functions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constinit19.C: New test.

tree-optimization/111228 - fix testcase

* gcc.dg/tree-ssa/forwprop-42.c: Use __UINT64_TYPE__ instead
of unsigned long.

test: Add xfail into slp-reduc-7.c for RVV VLA vectorization

Like ARM SVE, add RVV variable length xfail.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-reduc-7.c: Add RVV.

test: Adapt slp-26.c check for RVV

Fix FAILs:
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 0
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 0 loops" 1
FAIL: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 0

Since RVV is able to vectorize it with VLS modes like amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-26.c: Adapt for RVV.

fortran: Restore interface to its previous state on error [PR48776]

Keep memory of the content of the current interface body being parsed
and restore it to its previous state if it has been modified at the time
a parse attempt fails.

This fixes memory errors and random segmentation faults caused by
dangling symbol pointers kept in interfaces' linked lists of symbols.
If a parsing attempt fails and symbols are freed, they should also be
removed from the current interface linked list.

As the list of symbol is a linked list, and parsing only adds new
symbols to the head of the list, all that is needed to track the
previous content of the list is a pointer to its previous head.
This adds such a pointer, and the restoration of the list of symbols
to that pointer on error.

PR fortran/48776

gcc/fortran/ChangeLog:

* gfortran.h (gfc_drop_interface_elements_before): New prototype.
(gfc_current_interface_head): Return a reference to the pointer.
* interface.cc (gfc_current_interface_head): Ditto.
(free_interface_elements_until): New function, generalizing
gfc_free_interface.
(gfc_free_interface): Use free_interface_elements_until.
(gfc_drop_interface_elements_before): New function.
* parse.cc
(current_interface_ptr, previous_interface_head): New static variables.
(current_interface_valid_p, get_current_interface_ptr): New functions.
(decode_statement): Initialize previous_interface_head.
(reject_statement): Restore current interface pointer to point to
previous_interface_head.

gcc/testsuite/ChangeLog:

* gfortran.dg/interface_procedure_1.f90: New test.

tree-optimization/111228 - combine two VEC_PERM_EXPRs

The following adds simplification of two VEC_PERM_EXPRs where
the later one replaces all elements from either the first or the
second input of the earlier permute. This allows a three input
permute to be simplified to a two input one.

I'm following the existing two input simplification case and only
allow non-VLA permutes. The now existing three cases and the
single case in tree-ssa-forwprop.cc somehow ask for merging,
I'm not doing this as part of this change though.

PR tree-optimization/111228
* match.pd ((vec_perm (vec_perm ..) @5 ..) -> (vec_perm @x @5 ..)):
New simplifications.

* gcc.dg/tree-ssa/forwprop-42.c: New testcase.

RISC-V: Remove movmisalign pattern for VLA modes

This patch fixed this bunch of failures in "vect" testsuite:
FAIL: gcc.dg/vect/pr63341-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-1.c execution test
FAIL: gcc.dg/vect/pr63341-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr63341-2.c execution test
FAIL: gcc.dg/vect/pr94994.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr94994.c execution test
FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-1.c execution test
FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-align-2.c execution test

Spike report:
z  0000000000000000 ra 00000000000100f4 sp 0000003ffffffb30 gp 0000000000012cc8
tp 0000000000000000 t0 00000000000102d4 t1 000000000000000f t2 0000000000000000
s0 0000000000000000 s1 0000000000000000 a0 00000000000101a6 a1 0000000000000008
a2 0000000000000010 a3 0000000000012401 a4 0000000000012480 a5 0000000000000020
a6 000000000000001f a7 00000000000000d6 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 00000000000101ec va/inst 000000000206dc07 sr 8000000200006620
Load access fault!

(spike)
core   0: 0x0000000000010204 (0x02065087) vle16.v v1, (a2)
core   0: exception trap_load_address_misaligned, epc 0x0000000000010204
core   0:           tval 0x0000000000012c81
(spike) reg 0 a2
0x0000000000012c81

According to RVV ISA, we couldn't use "vle16.v" if the address is byte align.

Such issue is caused by this GIMPLE IR:

vect__1.15_17 = .MASK_LEN_LOAD (vectp_t.13_15, 8B, { -1, ... }, _24, 0);

For partial vectorization, the alignment is "8B" byte align here is incorrect here.

After this patch, the vectorization failed:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

I will enable auto-vectorization in another approach in the next following patch.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign<mode>): Delete.

test: Fix XPASS of RVV

XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4f.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4g.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4k.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects  scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
XPASS: gcc.dg/vect/vect-outer-4l.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1

Like ARM SVE, Fix these XPASS for RVV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-double-reduc-5.c: Add riscv.
* gcc.dg/vect/vect-outer-4e.c: Ditto.
* gcc.dg/vect/vect-outer-4f.c: Ditto.
* gcc.dg/vect/vect-outer-4g.c: Ditto.
* gcc.dg/vect/vect-outer-4k.c: Ditto.
* gcc.dg/vect/vect-outer-4l.c: Ditto.

test: Add xfail for riscv_vector

Like ARM SVE, when we enable scalable vectorization for RVV,
we can't do constant fold for these yet for both ARM SVE and RVV.

Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr88598-1.c: Add riscv_vector.
* gcc.dg/vect/pr88598-2.c: Ditto.
* gcc.dg/vect/pr88598-3.c: Ditto.

RISC-V: support cm.mva01s cm.mvsa01 in zcmp

Signed-off-by: Die Li <lidie@eswincomputing.com>
Co-Authored-By: Fei Gao <gaofei@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/peephole.md: New pattern.
* config/riscv/predicates.md (a0a1_reg_operand): New predicate.
(zcmp_mv_sreg_operand): New predicate.
* config/riscv/riscv.md: New predicate.
* config/riscv/zc.md (*mva01s<X:mode>): New pattern.
(*mvsa01<X:mode>): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cm_mv_rv32.c: New test.

RISC-V: support cm.popretz in zcmp

Generate cm.popretz instead of cm.popret if return value is 0.

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_zcmp_can_use_popretz): true if popretz can be used
(riscv_gen_multi_pop_insn): interface to generate cm.pop[ret][z]
(riscv_expand_epilogue): expand cm.pop[ret][z] in epilogue
* config/riscv/riscv.md: define A0_REGNUM
* config/riscv/zc.md
(@gpr_multi_popretz_up_to_ra_<mode>): md for popretz ra
(@gpr_multi_popretz_up_to_s0_<mode>): md for popretz ra, s0
(@gpr_multi_popretz_up_to_s1_<mode>): likewise
(@gpr_multi_popretz_up_to_s2_<mode>): likewise
(@gpr_multi_popretz_up_to_s3_<mode>): likewise
(@gpr_multi_popretz_up_to_s4_<mode>): likewise
(@gpr_multi_popretz_up_to_s5_<mode>): likewise
(@gpr_multi_popretz_up_to_s6_<mode>): likewise
(@gpr_multi_popretz_up_to_s7_<mode>): likewise
(@gpr_multi_popretz_up_to_s8_<mode>): likewise
(@gpr_multi_popretz_up_to_s9_<mode>): likewise
(@gpr_multi_popretz_up_to_s11_<mode>): likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: add testcase for cm.popretz in rv32e
* gcc.target/riscv/rv32i_zcmp.c: add testcase for cm.popretz in rv32i

RISC-V: support cm.push cm.pop cm.popret in zcmp

Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.

Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and
local variables if any.

Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.

gcc/ChangeLog:

* config/riscv/iterators.md
(slot0_offset): slot 0 offset in stack GPRs area in bytes
(slot1_offset): slot 1 offset in stack GPRs area in bytes
(slot2_offset): likewise
(slot3_offset): likewise
(slot4_offset): likewise
(slot5_offset): likewise
(slot6_offset): likewise
(slot7_offset): likewise
(slot8_offset): likewise
(slot9_offset): likewise
(slot10_offset): likewise
(slot11_offset): likewise
(slot12_offset): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
(CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11)
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_push_fpr.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.

tree-ssa-strlen: Fix up handling of conditionally zero memcpy [PR110914]

The following testcase is miscompiled since r279392 aka r10-5451-gef29b12cfbb4979
The strlen pass has adjust_last_stmt function, which performs mainly strcat
or strcat-like optimizations (say strcpy (x, "abcd"); strcat (x, p);
or equivalent memcpy (x, "abcd", strlen ("abcd") + 1); char *q = strchr (x, 0);
memcpy (x, p, strlen (p)); etc. where the first stmt stores '\0' character
at the end but next immediately overwrites it and so the first memcpy can be
adjusted to store 1 fewer bytes.  handle_builtin_memcpy called this function
in two spots, the first one guarded like:
  if (olddsi != NULL
      && tree_fits_uhwi_p (len)
      && !integer_zerop (len))
    adjust_last_stmt (olddsi, stmt, false);
i.e. only for constant non-zero length.  The other spot can call it even
for non-constant length but in that case we punt before that if that length
isn't length of some string + 1, so again non-zero.
The r279392 change I assume wanted to add some warning stuff and changed it
like
   if (olddsi != NULL
-      && tree_fits_uhwi_p (len)
       && !integer_zerop (len))
-    adjust_last_stmt (olddsi, stmt, false);
+    {
+      maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
+      adjust_last_stmt (olddsi, stmt, false);
+    }
While maybe_warn_overflow possibly handles non-constant length fine,
adjust_last_stmt really relies on length to be non-zero, which
!integer_zerop (len) alone doesn't guarantee.  While we could for
len being SSA_NAME ask the ranger or tree_expr_nonzero_p, I think
adjust_last_stmt will not benefit from it much, so the following patch
just restores the above condition/previous behavior for the adjust_last_stmt
call only.

2023-08-30  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/110914
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_memcpy): Don't call
adjust_last_stmt unless len is known constant.

* gcc.c-torture/execute/pr110914.c: New test.

store-merging: Fix up >= 64 bit insertion [PR111015]

The following testcase shows that we mishandle bit insertion for
info->bitsize >= 64. The problem is in using unsigned HOST_WIDE_INT
shift + subtraction + build_int_cst to compute mask, the shift invokes
UB at compile time for info->bitsize 64 and larger and e.g. on the testcase
with info->bitsize happens to compute mask of 0x3f rather than
0x3f'ffffffff'ffffffff.

The patch fixes that by using wide_int wi::mask + wide_int_to_tree, so it
handles masks in any precision (up to WIDE_INT_MAX_PRECISION ;) ).

2023-08-30 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/111015
* gimple-ssa-store-merging.cc
(imm_store_chain_info::output_merged_store): Use wi::mask and
wide_int_to_tree instead of unsigned HOST_WIDE_INT shift and
build_int_cst to build BIT_AND_EXPR mask.

* gcc.dg/pr111015.c: New test.

middle-end: Apply MASK_LEN_LOAD_LANES/MASK_LEN_STORE_LANES to ivopts/alias

Like MASK_LOAD_LANES/MASK_STORE_LANES, add MASK_LEN_ variant.

Bootstrap and Regression on X86 passed.

Ok for trunk?

gcc/ChangeLog:

* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Add MASK_LEN_ variant.
(call_may_clobber_ref_p_1): Ditto.
* tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
(get_alias_ptr_type_for_ptr_address): Ditto.

RISC-V: Make arch-24.c to test "success" case

arch-24.c and arch-25.c are exactly the same and redundant. The author
suspects that the original author intended to test two base ISAs (RV32I and
RV64I) so this commit changes arch-24.c to test that RV32I+Zcf does not
cause any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-24.c: Test RV32I+Zcf instead.

RISC-V: Make sure we get VL REG operand for VLMAX vsetvl

Fix ICE in "vect" testsuite:

FAIL: gcc.dg/vect/pr64495.c (internal compiler error: in df_uses_record, at df-scan.cc:2958)
FAIL: gcc.dg/vect/pr64495.c (test for excess errors

After this patch, all current found VSETVL PASS related bugs in "vect" are fixed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(vector_insn_info::get_avl_or_vl_reg): Fix bug.

RISC-V: Enable movmisalign for VLS modes

Prevous patch (which removed VLA modes movmisalign pattern) to fix run-time bug.
Such patch disable vectorization for misalign data movement.

After I check LLVM codes, LLVM supports misalign for VLS modes.

Before this patch:

sll     a5,a4,0x1
add     a5,a5,a1
lhu     a3,64(a5)
lbu     a5,66(a5)
addw    a4,a4,1
srl     a3,a3,0x8
sll     a5,a5,0x8
or      a5,a5,a3
sh      a5,0(a2)
add     a2,a2,2
bne     a4,a0,101f8 <foo+0x14>

After this patch:

foo:
lui a0,%hi(.LANCHOR0)
addi a0,a0,%lo(.LANCHOR0)
addi sp,sp,-16
addi a1,a0,1
li a2,64
sd ra,8(sp)
vsetvli zero,a2,e8,m4,ta,ma
addi a0,a0,128
vle8.v v4,0(a1)
vse8.v v4,0(a0)
call memcmp
bne a0,zero,.L6
ld ra,8(sp)
addi sp,sp,16
jr ra
.L6:
call abort

Note this patch has passed all testcases in "vect" which are related to alignment.

gcc/ChangeLog:

* config/riscv/autovec-vls.md (movmisalign<mode>): New pattern.
* config/riscv/riscv.cc (riscv_support_vector_misalignment): Support
VLS misalign.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: New test.

Daily bump.

RISC-V: Use splitter to generate zicond in another case

So in analyzing Ventana's internal tree against the trunk it became apparent
that the current zicond code is missing a case that helps coremark's bitwise
CRC implementation.

Here's a minimized testcase:

long xor1(long crc, long poly)
{
  if (crc & 1)
    crc ^= poly;

  return crc;
}

ie, it's just a conditional xor.

We generate this:

        andi    a5,a0,1
        neg     a5,a5
        and     a5,a5,a1
        xor     a0,a5,a0
        ret

But we should instead generate:

        andi    a5,a0,1
        czero.eqz       a5,a1,a5
        xor     a0,a5,a0
        ret

Combine wants to generate:

Trying 7, 8 -> 9:
    7: r140:DI=r137:DI&0x1
    8: r141:DI=-r140:DI
      REG_DEAD r140:DI
    9: r142:DI=r141:DI&r144:DI
      REG_DEAD r144:DI
      REG_DEAD r141:DI
Failed to match this instruction:
(set (reg:DI 142)
    (and:DI (sign_extract:DI (reg/v:DI 137 [ crc ])
            (const_int 1 [0x1])
            (const_int 0 [0]))
        (reg:DI 144)))

A splitter can rewrite the above into a suitable if-then-else construct and
squeeze an instruction out of that pesky CRC loop.  Sadly it doesn't really
help anything else.

The patch includes two variants.  One that uses ZBS, the other uses an ANDI
logical to produce the input condition.

gcc/
* config/riscv/zicond.md: New splitters to rewrite single bit
sign extension as the condition to a czero in the desired form.

gcc/testsuite
* gcc.target/riscv/zicond-xor-01.c: New test.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

analyzer: new warning: -Wanalyzer-overlapping-buffers [PR99860]

gcc/ChangeLog:
PR analyzer/99860
* Makefile.in (ANALYZER_OBJS): Add analyzer/ranges.o.

gcc/analyzer/ChangeLog:
PR analyzer/99860
* analyzer-selftests.cc (selftest::run_analyzer_selftests): Call
selftest::analyzer_ranges_cc_tests.
* analyzer-selftests.h (selftest::run_analyzer_selftests): New
decl.
* analyzer.opt (Wanalyzer-overlapping-buffers): New option.
* call-details.cc: Include "analyzer/ranges.h" and "make-unique.h".
(class overlapping_buffers): New.
(call_details::complain_about_overlap): New.
* call-details.h (call_details::complain_about_overlap): New decl.
* kf.cc (kf_memcpy_memmove::impl_call_pre): Call
cd.complain_about_overlap for memcpy and memcpy_chk.
(kf_strcat::impl_call_pre): Call cd.complain_about_overlap.
(kf_strcpy::impl_call_pre): Likewise.
* ranges.cc: New file.
* ranges.h: New file.

gcc/ChangeLog:
PR analyzer/99860
* doc/invoke.texi: Add -Wanalyzer-overlapping-buffers.

gcc/testsuite/ChangeLog:
PR analyzer/99860
* c-c++-common/analyzer/overlapping-buffers.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: tweaks for explicit conversion fns diagnostic

1) When saying that a conversion is erroneous because it would use
an explicit constructor, it might be nice to show where exactly
the explicit constructor is located.  For example, with this patch:

[...]
explicit.C:4:12: note: 'S::S(int)' declared here
    4 |   explicit S(int) { }
      |            ^

2) When a conversion doesn't work out merely because the conversion
function necessary to do the conversion couldn't be used because
it was marked explicit, it would be useful to the user to say so,
rather than just saying "cannot convert".  For example, with this patch:

explicit.C:13:12: error: cannot convert 'S' to 'bool' in initialization
   13 |   bool b = S{1};
      |            ^~~~
      |            |
      |            S
explicit.C:5:12: note: explicit conversion function was not considered
    5 |   explicit operator bool() const { return true; }
      |            ^~~~~~~~

gcc/cp/ChangeLog:

* call.cc (convert_like_internal): Show where the conversion function
was declared.
(maybe_show_nonconverting_candidate): New.
* cp-tree.h (maybe_show_nonconverting_candidate): Declare.
* typeck.cc (convert_for_assignment): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/explicit.C: New test.

RISC-V: Added zvfh support for zfa extensions.

This is a follow-up for the zfa extension, added according to the recommendations
for zvfh and patch of Tsukasa OI <research_trasio@irq.a4lg.com>. At the same time,
zfa-fli-5.c of which is also based on the patch.

Ref:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627284.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628492.html

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli):
zvfh can generate zfa extended instruction fli.h, just like zfh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fli-7.c: Change fa0 to fa\[0-9\] to avoid
assigning register numbers that are non-zero.
* gcc.target/riscv/zfa-fli-8.c: Ditto.
* gcc.target/riscv/zfa-fli-5.c: New test.

RISC-V: generate builtin macro for compilation with strict alignment

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Generate
__riscv_unaligned_avoid with value 1 or
__riscv_unaligned_slow with value 1 or
__riscv_unaligned_fast with value 1
* config/riscv/riscv.cc (riscv_option_override): Define
riscv_user_wants_strict_align. Set
riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-1.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/attribute-4.c: Check for
__riscv_unaligned_avoid
* gcc.target/riscv/attribute-5.c: Check for
__riscv_unaligned_slow or __riscv_unaligned_fast
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.
* gcc.target/riscv/predef-align-6.c: New test.

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>

libgccjit: add support for `restrict` attribute on function parameters

gcc/jit/Changelog:
* jit-playback.cc: Remove trailing whitespace characters.
* jit-playback.h: Add get_restrict method.
* jit-recording.cc: Add get_restrict methods.
* jit-recording.h: Add get_restrict methods.
* libgccjit++.h: Add get_restrict methods.
* libgccjit.cc: Add gcc_jit_type_get_restrict.
* libgccjit.h: Declare gcc_jit_type_get_restrict.
* libgccjit.map: Declare gcc_jit_type_get_restrict.

gcc/testsuite/ChangeLog:
* jit.dg/test-restrict.c: Add test for __restrict__ attribute.
* jit.dg/all-non-failing-tests.h: Add test-restrict.c to the list.

gcc/jit/ChangeLog:
* docs/topics/compatibility.rst: Add documentation for LIBGCCJIT_ABI_25.
* docs/topics/types.rst: Add documentation for gcc_jit_type_get_restrict.

Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com>

RISC-V: Add Types to Un-Typed Vector Instructions

Updates vector instructions to ensure that no instruction is left
without a type attribute. Create a placeholder type "vector" for
instructions where a type isn't clear

Tested for regressions using rv32/rv64 gc/gcv multilib with newlib/linux.

gcc/Changelog:

* config/riscv/autovec-vls.md: Update types
* config/riscv/riscv.md: Add vector placeholder type
* config/riscv/vector.md: Update types

Reviewed-by: Jeff Law <jlaw@ventanamicro.com>
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
    const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
    const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
     const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
     const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md (UNSPEC_DQUAN): New unspec.
(dfp_dqua_<mode>, dfp_dquai_<mode>): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448

analyzer: improve strdup handling [PR105899]

gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf.cc (kf_strdup::impl_call_pre): Set size of
dynamically-allocated buffer. Simulate copying the string from
the source region to the new buffer.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* c-c++-common/analyzer/pr99193-2.c: Add
-Wno-analyzer-too-complex.
* gcc.dg/analyzer/strdup-1.c: Include "analyzer-decls.h".
(test_concrete_strlen): New.
(test_symbolic_strlen): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

RISC-V: Fix one ICE for vect test vect-multitypes-5

There will be one ICE when build vect-multitypes-5.c similar as below:

riscv64-unknown-elf-gcc -O3 \
  -march=rv64imafdcv -mabi=lp64d -mcmodel=medlow \
  -fdiagnostics-plain-output -flto -ffat-lto-objects \
  --param riscv-autovec-preference=scalable -Wno-psabi \
  -ftree-vectorize -fno-tree-loop-distribute-patterns \
  -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details \
  gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c -o test.elf -lm

The below RTL is not well handled in riscv_legitimize_const_move, and
then fall through to the default pass. Then the
default force_const_mem will NULL_RTX, and will have ICE when operating
one the NULL_RTX.

(const:DI
  (plus:DI
    (symbol_ref:DI ("ic") [flags 0x2] <var_decl 0x7fe57740be10 ic>)
    (const_poly_int:DI [16, 16])))

This patch would like to take care of this rtl in riscv_legitimize_const_move.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_poly_move): New declaration.
(riscv_legitimize_const_move): Handle ref plus const poly.

RISC-V: Add stub support for existing extensions (unprivileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported standard unprivileged extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c except not yet
merged 'Zce', 'Zcmp' and 'Zcmt' support).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from unprivileged extensions.
(riscv_ext_version_table): Add stub support for all unprivileged
extensions supported by Binutils as well as 'Zce', 'Zcmp', 'Zcmt'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-31.c: New test for a stub unprivileged
extension 'Zcb' with some implications.

RISC-V: Add stub support for existing extensions (vendor)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

This commit adds stub supported vendor extensions to
riscv_ext_version_table (no riscv_implied_info entries to add; all
information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_ext_version_table):
Add stub support for all vendor extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-30.c: New test for a stub
vendor extension 'XVentanaCondOps'.

RISC-V: Add stub support for existing extensions (privileged)

After commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions, we have no
guarantee that we can share the same architectural string with Binutils
(specifically, the assembler).

To avoid compilation errors on shared Assembler-C/C++ projects or programs
with inline assembler, GCC should support almost all extensions that
Binutils support, even if the GCC itself does not touch a thing.

As a start, this commit adds stub supported *privileged* extensions to
riscv_ext_version_table and its implications to riscv_implied_info
(all information is copied from Binutils' bfd/elfxx-riscv.c).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add implications from privileged extensions.
(riscv_ext_version_table): Add stub support for all privileged
extensions supported by Binutils.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-29.c: New test for a stub privileged
extension 'Smstateen' with some implications.

RISC-V: Make PR 102957 tests more comprehensive

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will partially break the original intent of PR
102957 test case because we wanted to make sure that we can parse a valid
two-letter extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit adds pr102957-2.c to make sure that there will be no errors if
we parse a valid two-letter extension name.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957-2.c: New test case using the 'Zk'
extension to continue testing whether we can use valid two-letter
extensions.

RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}

This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add myself.

tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
middle, which adds/subtracts carry-in (from lower limbs) and computes
carry-out (to higher limbs).  Before optimizations (unless user writes
it intentionally that way already), all the steps look the same, but
optimizations simplify the handling of the least significant limb
(one which adds/subtracts 0 carry-in) to just a single
.{ADD,SUB}_OVERFLOW and the handling of the most significant limb
if the computed carry-out is ignored to normal addition/subtraction
of multiple operands.
Now, match_uaddc_usubc has code to turn that least significant
.{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
a more significant limb above it is matched into .U{ADD,SUB}C; this
isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
functionally equal to .UADDC (x, y, 0) (provided the types of operands
are the same and result is complex type with that type element), and
it also has code to match the most significant limb with ignored carry-out
(in that case one pattern match turns both the penultimate limb pair of
.{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
of the 4 values (2 carries) into another .U{ADD,SUB}C.

As the following patch shows, what we weren't handling is the case when
one uses either the __builtin_{add,sub}c builtins or hand written forms
thereof (either __builtin_*_overflow or even that written by hand) for
just 2 limbs, where the least significant has 0 carry-in and the most
significant ignores carry-out.  The following patch matches that, e.g.
  _16 = .ADD_OVERFLOW (_1, _2);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _15 = _3 + _4;
  _12 = _15 + _18;
into
  _16 = .UADDC (_1, _2, 0);
  _17 = REALPART_EXPR <_16>;
  _18 = IMAGPART_EXPR <_16>;
  _19 = .UADDC (_3, _4, _18);
  _12 = IMAGPART_EXPR <_19>;
so that we can emit better code.

As the 2 later comments show, we must do that carefully, because the
pass walks the IL from first to last stmt in a bb and we must avoid
pattern matching this way something that should be matched on a later
instruction differently.

2023-08-29  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/79173
PR middle-end/111209
* tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
carry-out on higher limb.  Don't match it though if it could be
matched later on 4 argument addition/subtraction.

* gcc.target/i386/pr79173-12.c: New test.

MATCH: Move `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p

This moves the match pattern `(x | y) & (~x ^ y)` over to use bitwise_inverted_equal_p.
This now also allows to optmize comparisons and also catches the missed `(~x | y) & (x ^ y)`
transformation into `~x & y`.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/111147
* match.pd (`(x | y) & (~x ^ y)`) Use bitwise_inverted_equal_p
instead of matching bit_not.

gcc/testsuite/ChangeLog:

PR tree-optimization/111147
* gcc.dg/tree-ssa/cmpbit-4.c: New test.

vect test: Remove xfail for riscv

We are planning to enable "vect" testsuite with scalable vector auto-vectorization.

This case XPASS:
XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1

like ARM SVE.
gcc/testsuite/ChangeLog:

* gcc.dg/vect/no-scevccp-outer-12.c: Add riscv xfail.

arm: Fix bootstrap / add missing initializer in MVE type_suffixes

My recent patch r14-3519-g9bae37ec8dc320 (arm: [MVE intrinsics] add
support for p8 and p16 polynomial types) added a new member to
type_suffix_info, but I forgot to add the corresponding initializer to
type_suffixes.

Committed as obvious.

2023-08-29 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/arm-mve-builtins.cc (type_suffixes): Add missing
initializer.

RISC-V: Fix ASM check of vlmax_switch_vtype-16.c

Notice there is a failure:
FAIL: gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c -O2 scan-assembler-times vsetvli\\s+zero,\\s*zero 2

Fix "2" into "3", the assembly is correct and better.

Committed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-16.c: Fix ASM check.