Zhangjin Liao [Wed, 23 Aug 2023 14:02:47 +0000 (08:02 -0600)]
[PATCH] RISC-V:add a more appropriate type attribute
Due to the more accurate type attribute added to the clz, ctz, and pcnt
operations in https://github.com/gcc-mirror/gcc/commit/07e2576d6f3 the
same type attribute should be used here.
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<bitmanip_optab>disi2_sext): Add a more
appropriate type attribute.
This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:
void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
for (int i = 0; i < n; i += 1)
{
a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
}
}
Before this patch:
...
vsetvli a7,zero,e32,m1,ta,ma
vfabs.v v2,v2
vmerge.vvm v1,v1,v2,v0
...
After this patch:
...
vsetvli a7,zero,e32,m1,ta,mu
vfabs.v v1,v2,v0.t
...
For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs<mode>2` and
`@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass.
A fusion process similar to the one below:
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.
Jan Hubicka [Wed, 23 Aug 2023 09:17:20 +0000 (11:17 +0200)]
Fix handling of static exists in loop_ch
This patch fixes wrong return value in should_duplicate_loop_header_p.
Doing so uncovered suboptimal decisions on some jump threading testcases
where we choose to stop duplicating just before basic block that has zero
cost and duplicating so would be always a win.
This is because the heuristics trying to choose right point to duplicate
all winning blocks and to get loop to be do_while did not account
zero_cost blocks in all cases. The patch simplifies the logic by
simply remembering zero cost blocks and handling them last after
the right stopping point is chosen.
gcc/ChangeLog:
* tree-ssa-loop-ch.cc (enum ch_decision): Fix comment.
(should_duplicate_loop_header_p): Fix return value for static exits.
(ch_base::copy_headers): Improve handling of ch_possible_zero_cost.
Lulu Cheng [Tue, 22 Aug 2023 11:56:21 +0000 (19:56 +0800)]
libffi: Backport of LoongArch support for libffi.
This is a backport of <https://github.com/libffi/libffi/commit/f259a6f6de>,
and contains modifications to commit 5a4774cd4d, as well as the LoongArch
schema portion of commit ee22ecbd11. This is needed for libgo.
libffi/ChangeLog:
PR libffi/108682
* configure.host: Add LoongArch support.
* Makefile.am: Likewise.
* Makefile.in: Regenerate.
* src/loongarch64/ffi.c: New file.
* src/loongarch64/ffitarget.h: New file.
* src/loongarch64/sysv.S: New file.
Kewen Lin [Wed, 23 Aug 2023 05:09:14 +0000 (00:09 -0500)]
vect: Move VMAT_GATHER_SCATTER handlings from final loop nest
Like r14-3317 which moves the handlings on memory access
type VMAT_GATHER_SCATTER in vectorizable_load final loop
nest, this one is to deal with vectorizable_store side.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.
Kewen Lin [Wed, 23 Aug 2023 05:09:14 +0000 (00:09 -0500)]
vect: Move VMAT_LOAD_STORE_LANES handlings from final loop nest
Like commit r14-3214 which moves the handlings on memory
access type VMAT_LOAD_STORE_LANES in vectorizable_load
final loop nest, this one is to deal with the function
vectorizable_store.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Move the handlings on
VMAT_LOAD_STORE_LANES in the final loop nest to its own loop,
and update the final nest accordingly.
Kewen Lin [Wed, 23 Aug 2023 05:09:14 +0000 (00:09 -0500)]
vect: Remove some manual release in vectorizable_store
To avoid some duplicates in some follow-up patches on
function vectorizable_store, this patch is to adjust some
existing vec with auto_vec and remove some manual release
invocation. Also refactor a bit and remove some uesless
codes.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_store): Remove vec oprnds,
adjust vec result_chain, vec_oprnd with auto_vec, and adjust
gvec_oprnds with auto_delete_vec.
libgomp, testsuite: Do not call nonstandard functions
The following functions are not standard, and not always available
(e.g., on darwin). They should not be called unless available: gamma,
gammaf, scalb, scalbf, significand, and significandf.
David Malcolm [Tue, 22 Aug 2023 22:36:54 +0000 (18:36 -0400)]
analyzer: reimplement kf_strlen [PR105899]
Reimplement kf_strlen in terms of the new string scanning
implementation, sharing strlen's implementation with
__analyzer_get_strlen.
gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf-analyzer.cc (class kf_analyzer_get_strlen): Move to kf.cc.
(register_known_analyzer_functions): Use make_kf_strlen.
* kf.cc (class kf_strlen::impl_call_pre): Replace with
implementation of kf_analyzer_get_strlen from kf-analyzer.cc.
Handle "UNKNOWN" return from check_for_null_terminated_string_arg
by falling back to a conjured svalue.
(make_kf_strlen): New.
(register_known_functions): Use make_kf_strlen.
* known-function-manager.h (make_kf_strlen): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/null-terminated-strings-1.c: Update expected
results on symbolic values.
* gcc.dg/analyzer/strlen-1.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jason Merrill [Fri, 18 Aug 2023 22:24:53 +0000 (18:24 -0400)]
c++: maybe_substitute_reqs_for fix
While working on PR109751 I found that maybe_substitute_reqs_for was doing
the wrong thing for a non-template friend, substituting in the template args
of the scope's original template rather than those of the instantiation.
This didn't end up being necessary to fix the PR, but it's still an
improvement.
gcc/cp/ChangeLog:
* pt.cc (outer_template_args): Handle non-template argument.
* constraint.cc (maybe_substitute_reqs_for): Pass decl to it.
* cp-tree.h (outer_template_args): Adjust.
Jason Merrill [Thu, 17 Aug 2023 15:36:23 +0000 (11:36 -0400)]
c++: constrained hidden friends [PR109751]
r13-4035 avoided a problem with overloading of constrained hidden friends by
checking satisfaction, but checking satisfaction early is inconsistent with
the usual late checking and can lead to hard errors, so let's not do that
after all.
We were wrongly treating the different instantiations of the same friend
template as the same function because maybe_substitute_reqs_for was failing
to actually substitute in the case of a non-template friend. But we don't
actually need to do the substitution anyway, because [temp.friend] says that
such a friend can't be the same as any other declaration.
After fixing that, instead of a redefinition error we got an ambiguous
overload error, fixed by allowing constrained hidden friends to coexist
until overload resolution, at which point they probably won't be in the same
ADL overload set anyway.
And we avoid mangling collisions by following the proposed mangling for
these friends as a member function with an extra 'F' before the name. I
demangle this by just adding [friend] to the name of the function because
it's not feasible to reconstruct the actual scope of the function since the
mangling ABI doesn't distinguish between class and namespace scopes.
PR c++/109751
gcc/cp/ChangeLog:
* cp-tree.h (member_like_constrained_friend_p): Declare.
* decl.cc (member_like_constrained_friend_p): New.
(function_requirements_equivalent_p): Check it.
(duplicate_decls): Check it.
(grokfndecl): Check friend template constraints.
* mangle.cc (decl_mangling_context): Check it.
(write_unqualified_name): Check it.
* pt.cc (uses_outer_template_parms_in_constraints): Fix for friends.
(tsubst_friend_function): Don't check satisfaction.
This adds multiarch support to the RISC-V port so that bootstraps work with
Debian out-of-the-box. Without this patch the stage1 compiler is unable to
find headers/libraries when building the stage1 runtime.
This is functionally (and possibly textually) equivalent to Debian's fix for
the same problem.
* libgomp.texi (OpenMP 5.2 status): Add depobj with
destroy-var argument as 'N'. Mark defaultmap with
'all' category as 'Y'.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/defaultmap-1.f90: Update dg-error.
* c-c++-common/gomp/defaultmap-5.c: New test.
* c-c++-common/gomp/defaultmap-6.c: New test.
* gfortran.dg/gomp/defaultmap-10.f90: New test.
* gfortran.dg/gomp/defaultmap-9.f90: New test.
Richard Biener [Tue, 22 Aug 2023 12:28:00 +0000 (14:28 +0200)]
Simplify intereaved store vectorization processing
When doing interleaving we perform code generation when visiting the
last store of a chain. We keep track of this via DR_GROUP_STORE_COUNT,
the following localizes this to the caller of vectorizable_store,
also avoing redundant non-processing of the other stores.
* tree-vect-stmts.cc (vectorizable_store): Do not bump
DR_GROUP_STORE_COUNT here. Remove early out.
(vect_transform_stmt): Only call vectorizable_store on
the last element of an interleaving chain.
to a vector permutation. The following implements this as
match.pd pattern, improving code generation on x86_64.
On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.
I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).
The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
where we now expand from
producing a vpblendw instruction instead of the expected vmovsh. That's
either a missed vec_perm_const expansion optimization or even better,
an improvement - Zen4 for example has 4 ports to execute vpblendw
but only 3 for executing vmovsh and both instructions have the same size.
The patch XFAILs the sub-testcase.
PR tree-optimization/94864
PR tree-optimization/94865
PR tree-optimization/93080
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.
Harald Anlauf [Mon, 21 Aug 2023 19:23:57 +0000 (21:23 +0200)]
Fortran: implement vector sections in DATA statements [PR49588]
gcc/fortran/ChangeLog:
PR fortran/49588
* data.cc (gfc_advance_section): Derive next index set and next offset
into DATA variable also for array references using vector sections.
Use auxiliary array to keep track of offsets into indexing vectors.
(gfc_get_section_index): Set up initial indices also for DATA variables
with array references using vector sections.
* data.h (gfc_get_section_index): Adjust prototype.
(gfc_advance_section): Likewise.
* resolve.cc (check_data_variable): Pass vector offsets.
gcc/testsuite/ChangeLog:
PR fortran/49588
* gfortran.dg/data_vector_section.f90: New test.
Lehua Ding [Tue, 22 Aug 2023 02:54:08 +0000 (10:54 +0800)]
RISC-V: Change fnms testcases assertion to xfail
Hi,
This patch fixes inappropriate assertions in fnms testcases since
we want to generate .COND_FNMS but actually generate .FNMS + .VCOND_MASK.
A patch to do this optimization will follow.
David Malcolm [Tue, 22 Aug 2023 01:13:19 +0000 (21:13 -0400)]
analyzer: check format strings for null termination [PR105899]
This patch extends -fanalyzer to check the format strings of calls
to functions marked with '__attribute__ ((format...))'.
The only checking done in this patch is to check that the format string
is a valid null-terminated string; this patch doesn't attempt to check
the content of the format string.
gcc/analyzer/ChangeLog:
PR analyzer/105899
* call-details.cc (call_details::call_details): New ctor.
* call-details.h (call_details::call_details): New ctor decl.
(struct call_arg_details): Move here from region-model.cc.
* region-model.cc (region_model::check_call_format_attr): New.
(region_model::check_call_args): Call it.
(struct call_arg_details): Move it to call-details.h.
* region-model.h (region_model::check_call_format_attr): New decl.
gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/attr-format-1.c: New test.
* gcc.dg/analyzer/sprintf-1.c: Update expected results for
now-passing tests.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 22 Aug 2023 01:13:19 +0000 (21:13 -0400)]
analyzer: replace -Wanalyzer-unterminated-string with scan_for_null_terminator [PR105899]
In r14-3169-g325f9e88802daa I added check_for_null_terminated_string_arg
to -fanalyzer, calling it in various places, with a sole check for
unterminated string constants, adding -Wanalyzer-unterminated-string for
this case.
This patch adds region_model::scan_for_null_terminator, which simulates
scanning memory for a zero byte, complaining about uninitiliazed bytes
and out-of-range accesses seen before any zero byte is seen.
This more flexible approach catches the issues we saw before with
-Wanalyzer-unterminated-string, and also catches uninitialized runs
of bytes, and I believe will be a better way to build checking of C
string operations in the analyzer.
Given that the patch makes -Wanalyzer-unterminated-string redundant
and that this option was only in trunk for 10 days and has no known
users, the patch simply removes the option without a compatibility
fallback.
The patch uses custom events and notes to provide context on where
the issues are coming from. For example, given:
null-terminated-strings-1.c: In function ‘test_partially_initialized’:
null-terminated-strings-1.c:71:3: warning: use of uninitialized value ‘buf[1]’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
71 | __analyzer_get_strlen (buf);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
‘test_partially_initialized’: events 1-3
|
| 69 | char buf[16];
| | ^~~
| | |
| | (1) region created on stack here
| 70 | buf[0] = 'a';
| 71 | __analyzer_get_strlen (buf);
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (2) while looking for null terminator for argument 1 (‘&buf’) of ‘__analyzer_get_strlen’...
| | (3) use of uninitialized value ‘buf[1]’ here
|
analyzer-decls.h:59:22: note: argument 1 of ‘__analyzer_get_strlen’ must be a pointer to a null-terminated string
59 | extern __SIZE_TYPE__ __analyzer_get_strlen (const char *ptr);
| ^~~~~~~~~~~~~~~~~~~~~
gcc/analyzer/ChangeLog:
PR analyzer/105899
* analyzer.opt (Wanalyzer-unterminated-string): Delete.
* call-details.cc
(call_details::check_for_null_terminated_string_arg): Convert
return type from void to const svalue *. Add param "out_sval".
* call-details.h
(call_details::check_for_null_terminated_string_arg): Likewise.
* kf-analyzer.cc (kf_analyzer_get_strlen::impl_call_pre): Wire up
to result of check_for_null_terminated_string_arg.
* region-model.cc (get_strlen): Delete.
(class unterminated_string_arg): Delete.
(struct fragment): New.
(class iterable_cluster): New.
(region_model::get_store_bytes): New.
(get_tree_for_byte_offset): New.
(region_model::scan_for_null_terminator): New.
(region_model::check_for_null_terminated_string_arg): Convert
return type from void to const svalue *. Add param "out_sval".
Reimplement in terms of scan_for_null_terminator, dropping the
special-case for -Wanalyzer-unterminated-string.
* region-model.h (region_model::get_store_bytes): New decl.
(region_model::scan_for_null_terminator): New decl.
(region_model::check_for_null_terminated_string_arg): Convert
return type from void to const svalue *. Add param "out_sval".
* store.cc (concrete_binding::get_byte_range): New.
* store.h (concrete_binding::get_byte_range): New decl.
(store_manager::get_concrete_binding): New overload.
Edwin Lu [Mon, 21 Aug 2023 21:20:24 +0000 (15:20 -0600)]
[PATCH] RISC-V: Add Types to Missing Bitmanip Instructions
This patch updates the bitmanip instructions to ensure that no insn is left
without a type attribute. Updates a total of 8 insns to have type "bitmanip"
Tested for regressions using rv32/64 multilib with newlib/linux.
gcc/Changelog:
* config/riscv/bitmanip.md: Added bitmanip type to insns
that are missing types.
Jeff Law [Mon, 21 Aug 2023 17:20:28 +0000 (11:20 -0600)]
[RISCV][committed] Remove spurious newline in ztso sequence
amo-table-ztso-load-3 the coordination branch after merging up the Ztso changes
due to a spurious newline in the output causing scan-function-body to fail.
There's probably an over-zealous .* or similar regexp in the framework. I
didn't see it in a quick scan, but could have easily missed it.
Regardless, fixing the extraneous newline is easy :-)
Tsukasa OI [Mon, 21 Aug 2023 13:31:52 +0000 (07:31 -0600)]
[PATCH 2/2] RISC-V: Add quotes to #error messages (all)
From: Tsukasa OI <research_trasio@irq.a4lg.com>
In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names. To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes all
unquoted #error messages.
Tsukasa OI [Mon, 21 Aug 2023 13:31:13 +0000 (07:31 -0600)]
[PATCH 1/2] RISC-V: Add quotes to #error messages
In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names. To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes #error
messages where necessary to avoid current test case failures.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr65947-7.c: Add target check aarch64*-*-* and scan vect
dump for pattern "optimizing condition reduction with FOLD_EXTRACT_LAST"
for targets that support vect_fold_extract_last.
Richard Biener [Mon, 21 Aug 2023 11:09:31 +0000 (13:09 +0200)]
Fix gcc.dg/vect/bb-slp-46.c FAIL
When relaxing vectorization of possibly overflowing reductions I
failed to update a testcase that will now vectorize and no longer
test for what it was written for. The following replaces the
vectorizable add with a division.
* gcc.dg/vect/bb-slp-46.c: Use division instead of addition
to avoid reduction vectorization.
In valid_mask_for_fold_vec_perm_cst we set arg_npatterns always
to VECTOR_CST_NPATTERNS (arg0) because of (q1 & 0) == 0:
/* Ensure that the stepped sequence always selects from the same
input pattern. */
unsigned arg_npatterns
= ((q1 & 0) == 0) ? VECTOR_CST_NPATTERNS (arg0)
: VECTOR_CST_NPATTERNS (arg1);
resulting in wrong code-gen issues.
The patch fixes this by changing the condition to (q1 & 1) == 0.
gcc/ChangeLog:
PR tree-optimization/111048
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Set arg_npatterns
correctly.
(fold_vec_perm_cst): Remove workaround and again call
valid_mask_fold_vec_perm_cst_p for both VLS and VLA vectors.
(test_fold_vec_perm_cst::test_nunits_min_4): Add test-case.
Richard Biener [Mon, 21 Aug 2023 09:07:18 +0000 (11:07 +0200)]
tree-optimization/111082 - bogus promoted min
vectorize_slp_instance_root_stmt promotes operations with undefined
overflow to unsigned arithmetic but fails to consider operations
that do not overflow like MIN which it turned into MIN with wrong
signedness and in the case of the PR an unsupported operation.
The following rectifies this.
PR tree-optimization/111082
* tree-vect-slp.cc (vectorize_slp_instance_root_stmt): Only
pun operations that can overflow.
Jonathan Wakely [Mon, 21 Aug 2023 09:36:13 +0000 (10:36 +0100)]
libstdc++: Remove reliance on unspecified behaviour in std::rethrow_if_nested test
This test case calls std::set_terminate while there is an active
exception. Since LWG 2111 it is unspecified which terminate handler is
used when std::nested_exception::rethrow_nested() calls std::terminate.
With libsupc++ the global handler changed by std::set_terminate is used,
but libc++abi uses the active exception's handler (the one that was
current when the exception was first thrown).
Adjust the test case so that it works with either implementation choice.
So that the process doesn't exit cleanly if std::terminate happens
sooner than expected, use a global variable to control when the "clean
terminate" behaviour happens.
libstdc++-v3/ChangeLog:
* testsuite/18_support/nested_exception/rethrow_if_nested-term.cc:
Call std::set_terminate before throwing the nested exception.
Juzhe-Zhong [Mon, 21 Aug 2023 01:04:53 +0000 (09:04 +0800)]
LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend
This patch exports 'compute_antinout_edge' and 'compute_earliest' as global scope
which is going to be used in VSETVL PASS of RISC-V backend.
The demand fusion is the fusion of VSETVL information to emit VSETVL which dominate and pre-config for most
of the RVV instructions in order to elide redundant VSETVLs.
For exmaple:
for
for
for
if (cond}
VSETVL demand 1: SEW/LMUL = 16 and TU policy
else
VSETVL demand 2: SEW = 32
VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW = 32, LMUL = M2, TU policy.
Then emit such VSETVL at the outmost of the for loop to get the most optimal codegen and run-time execution.
Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and un-reliable as well as un-maintainable.
And, I recently read dragon book and morgan's book again, I found there "earliest" can allow us to do the
demand fusion in a very reliable and optimal way.
So, this patch exports these 2 functions which are very helpful for VSETVL pass.
gcc/ChangeLog:
* lcm.cc (compute_antinout_edge): Export as global use.
(compute_earliest): Ditto.
(compute_rev_insert_delete): Ditto.
* lcm.h (compute_antinout_edge): Ditto.
(compute_earliest): Ditto.
Andrew Pinski [Mon, 21 Aug 2023 00:22:27 +0000 (17:22 -0700)]
MATCH: [PR111002] Sink view_convert for vec_cond
Like convert we can sink view_convert into vec_cond but
we can only do it if the element types are nop_conversions.
This is to allow conversion between signed and unsigned types only.
Rather than between integer and float types which mess up the vec_cond
so that isel does not understand `a?-1:0` is still that.
OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
PR tree-optimization/111002
gcc/ChangeLog:
* match.pd (view_convert(vec_cond(a,b,c))): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/cond_convert_8.c: New test.
liuhongt [Fri, 18 Aug 2023 02:30:35 +0000 (10:30 +0800)]
Support -march=gracemont
Alderlake-N is E-core only, add it as an alias of Alderlake.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_intel_cpu): Detect
Alderlake-N.
* common/config/i386/i386-common.cc (alias_table): Support
-march=gracemont as an alias of -march=alderlake.
Gaius Mulley [Sun, 20 Aug 2023 22:27:34 +0000 (23:27 +0100)]
PR modula2/111085 nexttoward and nexttowardf contain incorrect definitions
The definition for procedures nexttoward and nexttowardf contain
second incorrect parameter and return types. This bug was
discovered when attempting to resolve PR 108143 and is applied
separately and prior to PR 108143.
gcc/m2/ChangeLog:
PR modula2/111085
* gm2-libs/Builtins.def (nexttoward): Alter the second
parameter to LONGREAL.
(nexttowardf): Alter the second parameter to LONGREAL.
* gm2-libs/Builtins.mod (nexttoward): Alter the second
parameter to LONGREAL.
(nexttowardf): Alter the second parameter to LONGREAL.
* gm2-libs/cbuiltin.def (nexttoward): Alter the second
parameter to LONGREAL.
(nexttowardf): Alter the second parameter to LONGREAL.
testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message
Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
changed the compiler error message in this testcase from
<source>: In instantiation of 'void foo() [with T = int]':
<source>:14:11: required from here
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:3: error: expected iteration declaration or initialization
compiler exited with status 1
to:
<source>: In instantiation of 'void foo() [with T = int]':
<source>:14:11: required from here
<source>:8:22: error: 'int' is not a class, struct, or union type
<source>:8:3: error: invalid type for iteration variable 'i'
compiler exited with status 1
Excess errors:
<source>:8:3: error: invalid type for iteration variable 'i'
Andrew Pinski analysed the issue in PR 110756 and considered that it was a
testsuite issue in that the error message changed slightly. Also, it's a
better error message.
Therefore, we only need to adjust the testcase to expect the new message.
gcc/testsuite/ChangeLog:
PR testsuite/110756
* g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
Testsuite, plugin: make testcase pattern more flexible
On Darwin, the message recorded in the sarif file contains:
"message": {"text": "Segmentation fault: 11"}
which is different from, e.g., linux:
"message": {"text": "Segmentation fault"}
Adjusting the testcase pattern to be a little more flexible.
Uros Bizjak [Sun, 20 Aug 2023 15:52:22 +0000 (17:52 +0200)]
i386: Micro-optimize ix86_expand_sse_extend
Partial vector src is forced to a register as ops[1], we can use it
instead of SRC in the call to ix86_expand_sse_cmp. This change avoids
forcing operand[1] to a register in sign/zero-extend expanders.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1]
instead of src in the call to ix86_expand_sse_cmp.
* config/i386/sse.md (<any_extend:insn>v8qiv8hi2): Do not
force operands[1] to a register.
(<any_extend:insn>v4hiv4si2): Ditto.
(<any_extend:insn>v2siv2di2): Ditto.
- Import dmd v2.105.0-beta.1.
- Added predefined version identifier VisionOS (ignored by GDC).
- Functions can no longer have `enum` storage class.
- The deprecation of the `body` keyword has been reverted, it is
now an obsolete feature.
- The error for `scope class` has been reverted, it is now an
obsolete feature.
D runtime changes:
- Import druntime v2.105.0-beta.1.
Phobos changes:
- Import phobos v2.105.0-beta.1.
- AliasSeq has been removed from std.math.
- extern(C) getdelim and getline have been removed from
std.stdio.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 26f049fb26.
* dmd/VERSION: Bump version to v2.105.0-beta.1.
* d-codegen.cc (get_frameinfo): Check useGC in condition.
* d-lang.cc (d_handle_option): Set obsolete parameter when compiling
with -Wall.
(d_post_options): Set useGC to false when compiling with
-fno-druntime. Propagate obsolete flag to compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Check useGC in condition.
On macOS, system headers redefine by default some macros (memcpy,
memmove, etc) to checked versions, which defeats the analyzer. We
want to turn this off.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104042
Andrew Pinski [Tue, 15 Aug 2023 01:35:53 +0000 (18:35 -0700)]
MATCH: Sink convert for vec_cond
Convert be sinked into a vec_cond if both sides
fold. Unlike other unary operations, we need to check that we still can handle
this vec_cond's first operand is the same as the new truth type.
I tried a few different versions of this patch:
view_convert to the new truth_type but that does not work as we always support all vec_cond
afterwards.
using expand_vec_cond_expr_p; but that would allow too much.
I also tried to see if view_convert can be handled here but we end up with:
_3 = VEC_COND_EXPR <_2, { Nan(-1), Nan(-1), Nan(-1), Nan(-1) }, { 0.0, 0.0, 0.0, 0.0 }>;
Which isel does not know how to handle as just being a view_convert from `vector(4) <signed-boolean:32>`
to `vector(4) float` and causes a regression with `g++.target/i386/pr88152.C`
Note, in the case of the SVE testcase, we will sink negate after the convert and be able
to remove a few extra instructions in the end.
Also with this change gcc.target/aarch64/sve/cond_unary_5.c will now pass.
Committed as approved after a bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
Eric Gallager [Wed, 25 May 2022 16:45:33 +0000 (12:45 -0400)]
improve error when /usr/include isn't found [PR90835]
This is a pretty simple patch that ought to help Darwin users understand
better why their build is failing when they forget to pass the
--with-sysroot= flag to configure.
gcc/ChangeLog:
PR target/90835
* Makefile.in: improve error message when /usr/include is
missing
Tomas Kalibera [Sun, 20 Aug 2023 02:16:16 +0000 (02:16 +0000)]
Fix format attribute for printf
Since a long time (GCC 4.4?) GCC does support annotating functions
with either the format attribute "gnu_printf" or "ms_printf" to
distinguish between different format string interpretations.
However, it seems like the attribute is ignored for the "printf"
symbol; regardless what the function declaration says, GCC treats
it as "ms_printf". This has become an issue now that mingw-w64
supports using the UCRT instead of msvcrt.dll, and in this case
the stdio functions are declared with the gnu_printf attribute,
and inttypes.h uses the same format specifiers as in GNU mode.
void function(void) {
long long unsigned x = 42;
othername("%llu\n", x);
printf("%llu\n", x);
}
$ x86_64-w64-mingw32-gcc -c -Wformat format.c
format.c: In function 'function':
format.c:7:15: warning: unknown conversion type character 'l' in format [-Wformat=]
7 | printf("%llu\n", x);
| ^
format.c:7:12: warning: too many arguments for format [-Wformat-extra-args]
7 | printf("%llu\n", x);
| ^~~~~~~~
Note how both functions, printf and othername, are declare with
identical gnu_printf format attributes - GCC does take this into
account for "othername" and doesn't produce a warning, but GCC
seems to disregard the attribute in the printf declaration and
behave as if it was declared as ms_printf.
If the printf function declaration is changed into a static inline
function, the actual attribute used is honored though.
gcc/c-family/ChangeLog:
PR c/95130
* c-format.cc: skip default format for printf symbol if
explicitly declared by prototype.
Signed-off-by: Tomas Kalibera <tomas.kalibera@gmail.com> Signed-off-by: Jonathan Yong <10walls@gmail.com>
Tobias Burnus [Sat, 19 Aug 2023 05:49:06 +0000 (07:49 +0200)]
omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]
Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in
expand_omp_for_* were inserted with
gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT);
except the one dealing with the multiplicative factor that was
gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING);
That commit for PR103208 fixed the issue of some missing regimplify of
operands of GIMPLE_CONDs by moving the condition handling to the new function
expand_omp_build_cond. While that function has an 'bool after = false'
argument to switch between the two variants.
However, all callers ommited this argument. This commit reinstates the
prior behavior by passing 'true' for the factor != 0 condition, fixing
the included testcase.
PR middle-end/111017
gcc/
* omp-expand.cc (expand_omp_for_init_vars): Pass after=true
to expand_omp_build_cond for 'factor != 0' condition, resulting
in pre-r12-5295-g47de0b56ee455e code for the gimple insert.
libgomp/
* testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.
Jonathan Wakely [Fri, 18 Aug 2023 12:10:15 +0000 (13:10 +0100)]
libstdc++: Revert pre-C++23 support for 16-bit float types [PR111060]
In r14-3304-g1a566fddea212a and r14-3305-g6cf214b4fc97f5 I tried to
enable std::format for 16-bit float types before C++23. This causes
errors for targets where the types are defined but can't actually be
used, e.g. i686 without sse2.
Make the std::numeric_limits and std::formatter specializations for
_Float16 and __bfloat16_t depend on the __STDCPP_FLOAT16_T__ and
__STDCPP_BFLOAT16_T__ macros again, so they're only defined for C++23
when the type is fully supported. This is OK because the main point of
my earlier commits was to add better support for _Float32 and _Float64.
It seems fine for the new 16-bit types to only be supported for C++23,
as they were never present before GCC 13 anyway.
libstdc++-v3/ChangeLog:
PR target/111060
* include/std/format (formatter): Only define specializations
for 16-bit floating-point types for C++23.
* include/std/limits (numeric_limits): Likewise.
If GCC is tested with a sysroot which doesn't contain a Python
installation (e.g., with a command such as
"make check-gcc-c FLAGS_UNDER_TEST="--sysroot=/some/path"), but there's
a python3-config in $PATH, then the testsuite will pick up the host's
Python.h which can't actually be used:
Executing on host: python3-config --includes (timeout = 300)
spawn -ignore SIGHUP python3-config --includes
-I/usr/include/python3.10 -I/usr/include/python3.10
Executing on host: /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --sysroot=/some/sysroot/libc -Wl,-dynamic-linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/some/sysroot/libc/lib /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-plugin-test-2.s (timeout = 600)
spawn -ignore SIGHUP /some/sysroot/bin/aarch64-unknown-linux-gnu-gcc --sysroot=/some/sysroot/libc -Wl,-dynamic-linker=/some/sysroot/libc/lib/ld-linux-aarch64.so.1 -Wl,-rpath=/some/sysroot/libc/lib /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c -fdiagnostics-plain-output -fplugin=./analyzer_cpython_plugin.so -fanalyzer -I/usr/include/python3.10 -I/usr/include/python3.10 -S -o cpython-plugin-test-2.s
In file included from /usr/include/python3.10/Python.h:8,
from /some/src/gcc.git/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c:8:
/usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-gnu/python3.10/pyconfig.h: No such file or directory
compilation terminated.
compiler exited with status 1
This problem causes these testsuite failures:
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 17)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 18)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 21)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 31)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 32)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 35)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 45)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 55)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 63)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 66)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 68)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for warnings, line 69)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for excess errors)
Excess errors:
/usr/include/python3.10/pyconfig.h:9:12: fatal error: aarch64-linux-gnu/python3.10/pyconfig.h: No such file or directory
compilation terminated.
So try to compile a test file so that the testcase can be marked as
unsupported instead.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (dg-require-python-h): Test
whether Python.h can really be used.
Uros Bizjak [Fri, 18 Aug 2023 17:06:38 +0000 (19:06 +0200)]
i386: Use PUNPCKL?? to implement vector extend and zero_extend for TARGET_SSE2.
Implement vector extend and zero_extend functionality for TARGET_SSE2 using
PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to
V2DImode improves from:
* config/i386/i386-expand.cc (ix86_split_mmx_punpck):
Also handle V2QImode.
(ix86_expand_sse_extend): New function.
* config/i386/i386-protos.h (ix86_expand_sse_extend): New prototype.
* config/i386/mmx.md (<any_extend:insn>v4qiv4hi2): Enable for
TARGET_SSE2. Expand through ix86_expand_sse_extend for !TARGET_SSE4_1.
(<any_extend:insn>v2hiv2si2): Ditto.
(<any_extend:insn>v2qiv2hi2): Ditto.
* config/i386/sse.md (<any_extend:insn>v8qiv8hi2): Ditto.
(<any_extend:insn>v4hiv4si2): Ditto.
(<any_extend:insn>v2siv2di2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr111023-2.c: New test.
* gcc.target/i386/pr111023-4b.c: New test.
* gcc.target/i386/pr111023-8b.c: New test.
* gcc.target/i386/pr111023.c: New test.
Aldy Hernandez [Fri, 18 Aug 2023 10:41:46 +0000 (12:41 +0200)]
[irange] Return FALSE if updated bitmask is unchanged [PR110753]
The mask/value pair we track in the irange is a bit fickle in that it
can sometimes contradict the bitmask inherent in the range. This can
happen when a series of calculations yield a combination such as:
[3, 1000] MASK 0xfffffffe VALUE 0x0
The mask/value above implies that the lowest bit is a known 0, which
would exclude the 3 in the range. At one time we tried keeping mask
and ranges 100% consistent, but the performance penalty was too high
(5% in VRP). Also, it's unclear whether the intersection of two
incompatible known bits should make the whole range undefined, or
just the contradicting bits. This is all documented in
irange::get_bitmask(). We could revisit both of these assumptions
in the future.
In this testcase IPA ends up with a range where the lower 2 bits are
expected to be 0, but the range is [1,1].
[irange] long int [1, 1] MASK 0xfffffffffffffffc VALUE 0x0
This causes irange::union_bitmask() to think an update occurred, when
no semantic change happened, thus triggering an assert in IPA-cp. We
could get rid of the assert, but it's cleaner to make
irange::{union,intersect}_bitmask always tell the truth. Beside, the
ranger's cache also depends on union being truthful.
PR ipa/110753
gcc/ChangeLog:
* value-range.cc (irange::union_bitmask): Return FALSE if updated
bitmask is semantically equivalent to the original mask.
(irange::intersect_bitmask): Same.
(irange::get_bitmask): Add comment.
Richard Biener [Thu, 17 Aug 2023 13:21:33 +0000 (15:21 +0200)]
tree-optimization/111019 - invariant motion and aliasing
The following fixes a bad choice in representing things to the alias
oracle by LIM which while correct in pieces is inconsistent with itself.
When canonicalizing a ref to a bare deref instead of leaving the base
object and the extracted offset the same and just substituting an
alternate ref the following replaces the base and the offset as well,
avoiding the confusion that otherwise will arise in
aliasing_matching_component_refs_p.
PR tree-optimization/111019
* tree-ssa-loop-im.cc (gather_mem_refs_stmt): When canonicalizing
also scrap base and offset in case the ref is indirect.
Jonathan Wakely [Fri, 18 Aug 2023 10:28:32 +0000 (11:28 +0100)]
libstdc++: Replace non-type-dependent uses of wchar_t in <format> and <chrono>
This is one more piece of the rework to make wchar_t support in
std::format depend on _GLIBCXX_USE_WCHAR_T.
In <format> the __to_wstring_numeric function is called with arguments
that aren't type-dependent, so a declaration needs to be available, or
the calls need to be guarded by _GLIBCXX_USE_WCHAR_T.
In <chrono> there is a similarly non-type-dependent call to std::format
with a wchar_t format string, which is ill-formed when the wchar_t
overloads of std::format are not declared. Use _GLIBCXX_WIDEN to make it
type-dependent.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (operator<<): Make uses of wide
strings with streams and std::format type-dependent on _CharT.
* include/std/format [!_GLIBCXX_USE_WCHAR_T] Do not use
__to_wstring_numeric.
Kewen Lin [Fri, 18 Aug 2023 10:03:40 +0000 (05:03 -0500)]
Makefile.in: Make TM_P_H depend on $(TREE_H) [PR111021]
As PR111021 shows, the below ${port}-protos.h include tree.h
for code_helper and tree_code:
arm/arm-protos.h:#include "tree.h"
cris/cris-protos.h:#include "tree.h" (H-P removed this in r14-3218)
microblaze/microblaze-protos.h:#include "tree.h"
rl78/rl78-protos.h:#include "tree.h"
stormy16/stormy16-protos.h:#include "tree.h"
, when compiling build/gencondmd.cc, the include hierarchy
makes it depend on tm_p.h -> ${port}-protos.h -> tree.h,
which further includes (depends on) some files that are
generated during the building, such as: all-tree.def,
tree-check.h and so on. The previous commit r14-3215
should already force build/gencondmd.cc to depend on
${TREE_H}, so the reported build failure should be gone.
But for a long term maintenance, especially one day some
build/xxx.cc requires tm_p.h but not recog.h, the ${TREE_H}
dependence could be missed and a build failure will show
up. So this patch is to make TM_P_H depend on $(TREE_H),
any new build/xxx.cc depending on tm_p.h will be able to
consider ${TREE_H}.
It's tested with cross-builds for the affected ports with
steps:
1) dropped the fix r14-3215;
2) reproduced the build failure with serial build;
3) applied this patch, serial built and verified all passed;
4) added back r14-3215, serial built and verified all passed;
PR bootstrap/111021
gcc/ChangeLog:
* Makefile.in (TM_P_H): Add $(TREE_H) as dependence.
Kewen Lin [Fri, 18 Aug 2023 10:00:05 +0000 (05:00 -0500)]
vect: Factor out the handling on scatter store having gs_info.decl
Similar to the existing function vect_build_gather_load_calls,
this patch is to factor out the handling on scatter store
having gs_info.decl to vect_build_scatter_store_calls which
is a new function. It also does some minor refactoring like
moving some variables' declarations close to their uses and
restrict the scope for some of them etc.
It's a pre-patch for upcoming vectorizable_store re-structuring
for costing.
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_build_scatter_store_calls): New, factor
out from ...
(vectorizable_store): ... here.
Jonathan Wakely [Fri, 18 Aug 2023 08:33:23 +0000 (09:33 +0100)]
libstdc++: Fix incomplete rework of wchar_t support in std::format
r14-3300-g023a62b77f999b left make_wformat_args and some uses of
std::wformat_context unguarded by _GLIBCXX_USE_WCHAR_T.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (operator<<): Use __format_context.
* include/std/format (__format::__format_context): New alias
template.
[!_GLIBCXX_USE_WCHAR_T] (wformat_args, make_wformat_arg):
Disable.
Kewen Lin [Fri, 18 Aug 2023 07:26:52 +0000 (02:26 -0500)]
vect: Move VMAT_GATHER_SCATTER handlings from final loop nest
Following Richi's suggestion [1], this patch is to move the
handlings on VMAT_GATHER_SCATTER in the final loop nest
of function vectorizable_load to its own loop. Basically
it duplicates the final loop nest, clean up some useless
set up code for the case of VMAT_GATHER_SCATTER, remove some
unreachable code. Also remove the corresponding handlings
in the final loop nest.
* tree-vect-stmts.cc (vectorizable_load): Move the handlings on
VMAT_GATHER_SCATTER in the final loop nest to its own loop,
and update the final nest accordingly.
Lehua Ding [Fri, 18 Aug 2023 02:52:53 +0000 (10:52 +0800)]
RISC-V: Fix -march error of zhinxmin testcases
This little patch fixs the -march error of a zhinxmin testcase I added earlier
and an old zhinxmin testcase, since these testcases are for zhinxmin extension
and not zfhmin extension.
Andrew Pinski [Thu, 17 Aug 2023 19:16:25 +0000 (12:16 -0700)]
Document cond_neg, cond_one_cmpl, cond_len_neg and cond_len_one_cmpl standard patterns
When I added `cond_one_cmpl` (and the corresponding IFN) I had noticed cond_neg
standard named pattern was not documented and this adds the documentation for
all 4 named patterns now.
OK? Tested by building the manual.
gcc/ChangeLog:
* doc/md.texi (Standard patterns): Document cond_neg, cond_one_cmpl,
cond_len_neg and cond_len_one_cmpl.
Lehua Ding [Sat, 12 Aug 2023 08:12:45 +0000 (16:12 +0800)]
RISC-V: Add the missed half floating-point mode patterns of local_pic_load/store when only use zfhmin or zhinxmin
Hi,
There is a new failed RISC-V testcase(testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c)
on the current trunk branch when use medany as default cmodel.
The reason is the load of half floating-point imm is convert from RTL 1 to RTL
2 as the cmodel be changed from medlow to medany. This change let insn 7 be
combineed with @pred_broadcast patterns (insn 8) at combine pass. However,
insn 6 and insn 7 are combined for SF and DF mode, but not for HF mode, and
the fail combined leads to insn 7 and insn 8 be combined. The reason of the
fail combined is the local_pic_loadhf pattern doesn't exist when only enable
zfhmin(implied by zvfh).
Therefore, when only zfhmin but not zfh is enabled, the define_insn of
*local_pic_load<ANYF:mode> must also be able to produce the pattern for
*load_pic_loadhf pattern, since the zfhmin extension also includes a
half floating-point load/store instructions. So, I added an ANFLSF Iterator
and applied it to local_pic_load/store define_insns. I have checked other ANYF
usage scenarios and feel that this is the only place that needs to be corrected.
I may have missed something, please correct. Thanks.
Lehua Ding [Mon, 14 Aug 2023 03:34:13 +0000 (11:34 +0800)]
RISC-V: Revert the convert from vmv.s.x to vmv.v.i
Hi,
This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.
Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.
Before:
Advantages: The vsetvli info required by vmv.s.x has better compatibility since
vmv.s.x only required SEW and VLEN be zero or one. That mean there
is more opportunities to combine with other vsetlv infos in vsetvl pass.
Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
will be needed.
After:
Advantages: No need `li rd, imm` instruction since vmv.v.i support imm operand.
Disadvantages: Like before's advantages. Worse compatibility leads to more
vsetvl instrunctions need.
Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.
```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
int sum = 0;
for (int i = 0; i < n; i++)
sum += a[i] * b[i];
return sum;
}
```
asm (Before):
```
foo1:
ble a3,zero,.L7
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L6:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vv v1,v3,v2
bne a3,zero,.L6
vsetvli a2,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L7:
li a0,0
ret
```
asm (After):
```
foo1:
ble a3,zero,.L4
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vv v1,v3,v2
bne a3,zero,.L3
vsetivli zero,1,e32,m1,ta,ma
vmv.v.i v2,0
vsetvli a2,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret
```
Lehua Ding [Thu, 17 Aug 2023 07:42:51 +0000 (15:42 +0800)]
RISC-V: Forbidden fuse vlmax vsetvl to DEMAND_NONZERO_AVL vsetvl
Hi,
This little patch fix the fail testcase
(gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c)
after apply this patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627121.html).
The specific reason is that the vsetvl pass has bug and this patch
forbidden the fuse of this case. This patch needs to be committed
before that patch to work.