Richard Biener [Fri, 3 Nov 2023 10:31:37 +0000 (11:31 +0100)]
tree-optimization/112366 - remove assert for failed live lane code gen
The following removes a bogus assert constraining the uses that
could appear when a built from scalar defs SLP node constrains
code generation in a way so earlier uses of the vector CTOR
components fail to get vectorized. We can't really constrain the
operation such use appears in.
Thomas Schwinge [Wed, 14 Jun 2023 20:39:01 +0000 (22:39 +0200)]
Skip a number of 'g++.dg/tree-prof/' test cases for '-fno-exceptions' testing
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
Thomas Schwinge [Wed, 7 Jun 2023 14:11:11 +0000 (16:11 +0200)]
Skip a number of 'g++.dg/lto/' test cases for '-fno-exceptions' testing
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
The "compile"/"assemble" tests (either continue to work, or) result in the
expected 'UNSUPPORTED: [...] compile: exception handling disabled', but
dependent "link" and "execute" tests then turn UNRESOLVED.
Specify 'dg-require-effective-target exceptions_enabled' for those test cases.
Thomas Schwinge [Wed, 7 Jun 2023 14:11:11 +0000 (16:11 +0200)]
Skip a number of 'g++.dg/compat/' test cases for '-fno-exceptions' testing
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
The "compile"/"assemble" tests (either continue to work, or) result in the
expected 'UNSUPPORTED: [...] compile: exception handling disabled', but
dependent "link" and "execute" tests then turn UNRESOLVED.
Specify 'dg-require-effective-target exceptions_enabled' for those test cases.
Thomas Schwinge [Wed, 7 Jun 2023 12:14:44 +0000 (14:14 +0200)]
Skip a number of C++ test cases for '-fno-exceptions' testing
Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
return "::unsupported::exception handling disabled"
}
However, if there are additional 'dg-error' etc. directives, these may regress
PASS -> FAIL (or similar) -- if their associated diagnostics are precluded by
'error: exception handling disabled'. For example:
PASS: g++.dg/cpp2a/explicit1.C (test for errors, line 43)
PASS: g++.dg/cpp2a/explicit1.C (test for errors, line 47)
[-PASS:-]{+FAIL:+} g++.dg/cpp2a/explicit1.C (test for errors, line 50)
[-PASS:-]{+FAIL:+} g++.dg/cpp2a/explicit1.C (test for errors, line 51)
PASS: g++.dg/cpp2a/explicit1.C (test for errors, line 52)
PASS: g++.dg/cpp2a/explicit1.C (test for errors, line 53)
PASS: g++.dg/cpp2a/explicit1.C (test for errors, line 59)
[-PASS:-]{+UNSUPPORTED:+} g++.dg/cpp2a/explicit1.C [-(test for excess errors)-]{+: exception handling disabled+}
Specify 'dg-require-effective-target exceptions_enabled' for those test cases.
The following avoids hoisting expressions that may invoke undefined
behavior and are not computed on all paths. This is realized by
noting that we have to avoid materializing expressions as part
of hoisting that are not part of the set of expressions we have
found eligible for hoisting. Instead of picking the expression
corresponding to the hoistable values from the first successor
we now keep a union of the expressions so that hoisting can pick
the expression that has its dependences fully hoistable.
PR tree-optimization/112310
* tree-ssa-pre.cc (do_hoist_insertion): Keep the union
of expressions, validate dependences are contained within
the hoistable set before hoisting.
Paul Thomas [Fri, 3 Nov 2023 07:11:12 +0000 (07:11 +0000)]
Fortran: Defined operators with unlimited polymorphic args [PR98498]
2023-11-03 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): Defined operators using unlimited
polymorphic formal arguments must not override the intrinsic
operator use.
gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.
Pan Li [Thu, 2 Nov 2023 10:40:10 +0000 (18:40 +0800)]
RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator
Update in v2:
* Add mode size equal check to disable different mode size when expand,
because the underlying codegen is not implemented yet.
Original log:
The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.
* SF => SI
* DF => DI
After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:
* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI
Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.
* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI
As well as related mode_attr to reconcile the new iterator.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><v_i_l_ll_convert>2): Remove.
(lround<mode><v_i_l_ll_convert>2): Ditto.
(lceil<mode><v_i_l_ll_convert>2): Ditto.
(lfloor<mode><v_i_l_ll_convert>2): Ditto.
(lrint<mode><v_f2si_convert>2): New pattern for cvt from
FP to SI.
(lround<mode><v_f2si_convert>2): Ditto.
(lceil<mode><v_f2si_convert>2): Ditto.
(lfloor<mode><v_f2si_convert>2): Ditto.
(lrint<mode><v_f2di_convert>2): New pattern for cvt from
FP to DI.
(lround<mode><v_f2di_convert>2): Ditto.
(lceil<mode><v_f2di_convert>2): Ditto.
(lfloor<mode><v_f2di_convert>2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.
Juzhe-Zhong [Fri, 3 Nov 2023 00:36:03 +0000 (08:36 +0800)]
RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]
With compile option --param=riscv-autovec-preference=fixed-vlmax, we have
redundant AVL/VL toggling:
vsetvli a5,a3,e8,mf4,ta,ma -> should be changed into e32m1
vle32.v v1,0(a1)
vle32.v v2,0(a0)
vsetivli zero,4,e32,m1,ta,ma -> redundant
slli a2,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma -> redundant
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3
The root cause is because we simplify AVL into immediate AVL too early
in FIXED-VLMAX situation. The later avlprop PASS failed to propagate AVL
generated by (SELECT_VL/vsetvl VL, AVL) into the normal RVV instruction.
So we need to remove immedate AVL simplification in 'expand' stage.
After the removed simplification, the following situation should be fixed:
typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
vnx2qi v = {a, b};
*(vnx2qi *) out = v;
}
We should use vsetvili zero, 2 instead of vsetvl a5,zero.
Such simplification is done in avlprop PASS which is also included in this patch
to fix regression of these situation.
PR target/112326
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): New function.
(simplify_replace_vlmax_avl): Ditto.
(pass_avlprop::execute): Add immediate AVL simplification.
* config/riscv/riscv-protos.h (imm_avl_p): Rename.
* config/riscv/riscv-v.cc (const_vlmax_p): Ditto.
(imm_avl_p): Ditto.
(emit_vlmax_insn): Adapt for new interface name.
* config/riscv/vector.md (mode_idx): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr112326.c: New test.
Jason Merrill [Thu, 2 Nov 2023 03:21:34 +0000 (23:21 -0400)]
c++: use hash_set in nrv_data
I noticed we were using a hash_table directly here instead of the simpler
hash_set interface. Also, let's check for the variable itself and repeats
earlier, since they should happen more often than any of the other cases.
gcc/cp/ChangeLog:
* semantics.cc (nrv_data): Change visited to hash_set.
(finalize_nrv_r): Reorganize.
Jason Merrill [Mon, 30 Oct 2023 21:44:54 +0000 (17:44 -0400)]
c++: retval dtor on rethrow [PR112301]
In r12-6333 for PR33799, I fixed the example in [except.ctor]/2. In that
testcase, the exception is caught and the function returns again,
successfully.
In this testcase, however, the exception is rethrown, and hits two separate
cleanups: one in the try block and the other in the function body. So we
destroy twice an object that was only constructed once.
Fortunately, the fix for the normal case is easy: we just need to clear the
"return value constructed by return" flag when we do it the first time.
This gets more complicated with the named return value optimization, since
we don't want to destroy the return value while the NRV variable is still in
scope.
PR c++/112301
PR c++/102191
PR c++/33799
gcc/cp/ChangeLog:
* except.cc (maybe_splice_retval_cleanup): Clear
current_retval_sentinel when destroying retval.
* semantics.cc (nrv_data): Add in_nrv_cleanup.
(finalize_nrv): Set it.
(finalize_nrv_r): Fix handling of throwing cleanups.
Jeff Law [Thu, 2 Nov 2023 13:25:39 +0000 (07:25 -0600)]
[committed] Improve H8 sequences for single bit sign extractions
Spurred by Roger's recent work on ARC, this patch improves the code we
generation for single bit sign extractions.
The basic idea is to get the bit we want into C, the use a subx;ext.w;ext.l
sequence to sign extend it in a GPR.
For bits 0..15 we can use a bld instruction to get the bit we want into C. For
bits 16..31, we can move the high word into the low word, then use bld.
There's a couple special cases where we can shift the bit we want from the high
word into C which is an instruction smaller.
Not surprisingly most cases seen in newlib and the test suite are extractions
from the low byte, HImode sign bit and top two bits of SImode.
Regression tested on the H8 with no regressions. Installing on the trunk.
gcc/
* config/h8300/combiner.md: Add new patterns for single bit
sign extractions.
Pan Li [Thu, 2 Nov 2023 10:40:10 +0000 (18:40 +0800)]
RISC-V: Refactor prefix [I/L/LL] rounding API autovec iterator
The previous rounding API start with i/l/ll only works on the same
mode types. For example as below, and we arrange the iterator similar
to fcvt.
* SF => SI
* DF => DI
After we refined this limination from middle-end, these API can also
vectorized with different type sizes, aka:
* HF => SI, HF => DI
* SF => DI, SF => SI
* DF => SI, DF => DI
Then the iterator cannot take care of this simply and this patch
would like to re-arrange the iterator in two items.
* V_VLS_F_CONVERT_SI: handle (HF, SF, DF) => SI
* V_VLS_F_CONVERT_DI: handle (HF, SF, DF) => DI
As well as related mode_attr to reconcile the new iterator.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><v_i_l_ll_convert>2): Remove.
(lround<mode><v_i_l_ll_convert>2): Ditto.
(lceil<mode><v_i_l_ll_convert>2): Ditto.
(lfloor<mode><v_i_l_ll_convert>2): Ditto.
(lrint<mode><v_f2si_convert>2): New pattern for cvt from
FP to SI.
(lround<mode><v_f2si_convert>2): Ditto.
(lceil<mode><v_f2si_convert>2): Ditto.
(lfloor<mode><v_f2si_convert>2): Ditto.
(lrint<mode><v_f2di_convert>2): New pattern for cvt from
FP to DI.
(lround<mode><v_f2di_convert>2): Ditto.
(lceil<mode><v_f2di_convert>2): Ditto.
(lfloor<mode><v_f2di_convert>2): Ditto.
* config/riscv/vector-iterators.md: Renew iterators for both
the SI and DI.
Robin Dapp [Wed, 13 Sep 2023 20:19:35 +0000 (22:19 +0200)]
ifcvt/vect: Emit COND_OP for conditional scalar reduction.
As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions. This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.
It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.
gcc/ChangeLog:
PR middle-end/111401
* internal-fn.cc (internal_fn_else_index): New function.
* internal-fn.h (internal_fn_else_index): Define.
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
Richard Biener [Thu, 2 Nov 2023 09:39:03 +0000 (10:39 +0100)]
tree-optimization/112320 - bougs debug IL after SCCP
The following addresses wrong debug IL created by SCCP rewriting stmts
to defined overflow. I addressed another inefficiency there but
needed to adjust the API of rewrite_to_defined_overflow for this
which is now taking a stmt iterator for in-place operation and a
stmt for sequence producing because gsi_for_stmt doesn't work for
stmts not in the IL.
PR tree-optimization/112320
* gimple-fold.h (rewrite_to_defined_overflow): New overload
for in-place operation.
* gimple-fold.cc (rewrite_to_defined_overflow): Add stmt
iterator argument to worker, define separate API for
in-place and not in-place operation.
* tree-if-conv.cc (predicate_statements): Simplify.
* tree-scalar-evolution.cc (final_value_replacement_loop):
Likewise.
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute): Adjust.
* tree-ssa-reassoc.cc (update_range_test): Likewise.
The following patch implements C++26 unevaluated-string.
As it seems to me just extra pedanticity, it is implemented only for
-std=c++26 or -std=gnu++26 and later and only if -pedantic/-pedantic-errors.
Nothing is done for inline asm, while the spec changes those, it changes it
to a balanced token sequence with implementation defined rules on what is
and isn't allowed (so pedantically accepting asm ("" : "+m" (x));
was accepts-invalid before C++26, but we didn't diagnose anything).
For the other spots mentioned in the paper, static_assert message,
linkage specification, deprecated/nodiscard attributes it enforces the
requirements (no prefixes, udlit suffixes, no octal/hexadecimal escapes
(conditional escape sequences were rejected with pedantic already before).
For the deprecated operator "" identifier case I've kept things as is,
because everything seems to have been diagnosed already (a lot being implied
from the string having to be empty).
2023-11-02 Jakub Jelinek <jakub@redhat.com>
PR c++/110342
gcc/cp/
* parser.cc: Implement C++26 P2361R6 - Unevaluated strings.
(uneval_string_attr): New enumerator.
(cp_parser_string_literal_common): Add UNEVAL argument. If true,
pass CPP_UNEVAL_STRING rather than CPP_STRING to
cpp_interpret_string_notranslate.
(cp_parser_string_literal, cp_parser_userdef_string_literal): Adjust
callers of cp_parser_string_literal_common.
(cp_parser_unevaluated_string_literal): New function.
(cp_parser_parenthesized_expression_list): Handle uneval_string_attr.
(cp_parser_linkage_specification): Use
cp_parser_unevaluated_string_literal for C++26.
(cp_parser_static_assert): Likewise.
(cp_parser_std_attribute): Use uneval_string_attr for standard
deprecated and nodiscard attributes.
gcc/testsuite/
* g++.dg/cpp26/unevalstr1.C: New test.
* g++.dg/cpp26/unevalstr2.C: New test.
* g++.dg/cpp0x/udlit-error1.C (lol): Expect an error for C++26
about user-defined literal in deprecated attribute.
libcpp/
* include/cpplib.h (TTYPE_TABLE): Add CPP_UNEVAL_STRING literal
entry. Use C++11 instead of C++-0x in comments.
* charset.cc (convert_escape): Add UNEVAL argument, if true,
pedantically diagnose numeric escape sequences.
(cpp_interpret_string_1): Formatting fix. Adjust convert_escape
caller.
(cpp_interpret_string): Formatting string.
(cpp_interpret_string_notranslate): Pass type through to
cpp_interpret_string if it is CPP_UNEVAL_STRING.
Pan Li [Mon, 30 Oct 2023 07:29:21 +0000 (15:29 +0800)]
VECT: Refine the type size restriction of call vectorizer
Update in v4:
* Append the check to vectorizable_internal_function.
Update in v3:
* Add func to predicate type size is legal or not for vectorizer call.
Update in v2:
* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.
Original log:
The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.
void
test_lrintf (long *out, float *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}
Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.
The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.
The below test are passed for this patch.
* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.
The below test are ongoing.
* The x86 bootstrap and regression test.
* The aarch64 regression test.
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_internal_function): Add type
size check for vectype_out doesn't participating for optab query.
(vectorizable_call): Remove the type size check.
Eric Gallager [Mon, 16 Oct 2023 23:22:17 +0000 (19:22 -0400)]
Add files to discourage submissions of PRs to the GitHub mirror.
Currently there is an unofficial mirror of GCC on GitHub that people
sometimes submit pull requests to:
https://github.com/gcc-mirror/gcc
However, this is not the proper way to contribute to GCC, so that means
that someone (usually Jonathan Wakely) has to go through the PRs and
manually tell people that they're sending their PRs to the wrong place.
One thing that would help mitigate this problem would be files in a
special .github directory that GitHub would automatically open when
contributors attempt to open a PR, that would then tell them the proper
way to contribute instead. This patch attempts to add two such files.
They are written in Markdown, which I'm realizing might require some
special handling in this repository, since the ".md" extension is also
used for GCC's "Machine Description" files here, but I'm not quite sure
how to go about handling that. Also note that I adapted these files from
equivalent files in the git repository for Git itself:
https://github.com/git/git/blob/master/.github/CONTRIBUTING.md
https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
What do people think?
ChangeLog:
* .github/CONTRIBUTING.md: New file.
* .github/PULL_REQUEST_TEMPLATE.md: New file.
Roger Sayle [Wed, 1 Nov 2023 22:33:45 +0000 (22:33 +0000)]
PR target/110551: Tweak mulx register allocation using peephole2.
This patch is a follow-up to my previous PR target/110551 patch, this
time to address the additional move after mulx, seen on TARGET_BMI2
architectures (such as -march=haswell). The complication here is
that the flexible multiple-set mulx instruction is introduced into
RTL after reload, by split2, and therefore can't benefit from register
preferencing. This results in RTL like the following:
(insn 32 31 17 2 (parallel [
(set (reg:DI 4 si [orig:101 r ] [101])
(mult:DI (reg:DI 1 dx [109])
(reg:DI 5 di [109])))
(set (reg:DI 5 di [ r+8 ])
(umul_highpart:DI (reg:DI 1 dx [109])
(reg:DI 5 di [109])))
]) "pr110551-2.c":8:17 -1
(nil))
Here insn 32, the mulx instruction, places its results in si and di,
and then immediately after decides to move di to ax, with di now dead.
This can be trivially cleaned up by a peephole2. I've added an
additional constraint that the two SET_DESTs can't be the same
register to avoid confusing the middle-end, but this has well-defined
behaviour on x86_64/BMI2, encoding a umul_highpart.
For the new test case, compiled on x86_64 with -O2 -march=haswell:
2023-11-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110551
* config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition
as operands[2] with predicate register_operand must be !MEM_P.
(peephole2): Optimize a mulx followed by a register-to-register
move, to place result in the correct destination if possible.
gcc/testsuite/ChangeLog
PR target/110551
* gcc.target/i386/pr110551-2.c: New test case.
Patrick O'Neill [Tue, 31 Oct 2023 20:18:53 +0000 (13:18 -0700)]
RISC-V: Use riscv_subword_address for atomic_test_and_set
Other subword atomic patterns use riscv_subword_address to calculate
the aligned address, shift amount, mask and !mask. atomic_test_and_set
was implemented before the common function was added. After this patch
all subword atomic patterns use riscv_subword_address.
gcc/ChangeLog:
* config/riscv/sync.md: Use riscv_subword_address function to
calculate the address and shift in atomic_test_and_set.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Vineet Gupta [Wed, 1 Nov 2023 21:46:33 +0000 (14:46 -0700)]
RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls
Fixes: 3496ca4e6566 ("RISC-V: Add runtime invariant support")
riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case. It intends to do that however the code is broken (regression).
The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.
This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
returned for libcall case.
Martin Uecker [Thu, 27 Jul 2023 11:36:05 +0000 (13:36 +0200)]
c: Add Walloc-size to warn about insufficient size in allocations [PR71219]
Add option Walloc-size that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to. Added to Wextra.
Edwin Lu [Wed, 1 Nov 2023 17:19:14 +0000 (10:19 -0700)]
Make genautomata.cc output reflect insn-attr.h expectation
genautomata was writing the insn_has_dfa_reservation_p function
inside of the CPU_UNITS_QUERY conditional when it shouldn't have.
Move insn_has_dfa_reservation_p outside of conditional group.
Andre Vieira [Wed, 1 Nov 2023 17:02:41 +0000 (17:02 +0000)]
omp: Reorder call for TARGET_SIMD_CLONE_ADJUST
This patch moves the call to TARGET_SIMD_CLONE_ADJUST until after the arguments
and return types have been transformed into vector types. It also constructs
the adjuments and retval modifications after this call, allowing targets to
alter the types of the arguments and return of the clone prior to the
modifications to the function definition.
gcc/ChangeLog:
* omp-simd-clone.cc (simd_clone_adjust_return_type): Hoist out code to
create return array and don't return new type.
(simd_clone_adjust_argument_types): Hoist out code that creates
ipa_param_body_adjustments and don't return them.
(simd_clone_adjust): Call TARGET_SIMD_CLONE_ADJUST after return and
argument types have been vectorized, create adjustments and return array
after the hook.
(expand_simd_clones): Call TARGET_SIMD_CLONE_ADJUST after return and
argument types have been vectorized.
Uros Bizjak [Wed, 1 Nov 2023 09:41:57 +0000 (10:41 +0100)]
i386: Improve stack protector patterns and peephole2s
Improve stack protector patterns and peephole2s to substitute stack
protector scratch register clear with unrelated subsequent register
initialization in several ways:
a. Explicitly generate scratch register as named pseudo. This allows
optimizers to eventually reuse the zero value in the register.
b. Allow scratch register in different mode (SWI48) than PTR mode:
SImode moves on x86 zero-extend to the whole DImode register,
so stack protector paranoia is not compromised.
c. Relax peephole2 constraint that stack protector scratch register
must match new initialized register. This relaxation substantially
improves peephole2 opportunities, and generates sequences like:
We have to ensure the new scratch is dead in front of the sequence.
The patch also fixes omission of earlyclobbers for all alternatives of
new initialized register in *stack_protect_set_3, avoiding the need for
reg_overlap_mentioned_p constraint. Earlyclobbers are per alternative,
not per operand.
Also, instructions are already valid in peephole2 pass, so we don't
have to explicitly re-check their operands for validity.
gcc/ChangeLog:
* config/i386/i386.md (stack_protect_set): Explicitly
generate scratch register in word mode.
(@stack_protect_set_1_<mode>): Rename to ...
(@stack_protect_set_1_<PTR:mode>_<SWI48:mode>): ... this.
Use SWI48 mode iterator to match scratch register.
(stack_protexct_set_1 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence. Use general_operand
predicate for operand 4. Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.
(*stack_protect_set_2_<mode>): Rename to ...
(*stack_protect_set_2_<mode>_si): .. this.
(*stack_protect_set_3): Rename to ...
(*stack_protect_set_2_<mode>_di): ... this.
Use PTR mode iterator to match stack protector memory move.
Use earlyclobber for all alternatives of operand 1.
(stack_protexct_set_2 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence. Use general_operand
predicate for operand 4. Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.
Gaius Mulley [Wed, 1 Nov 2023 09:05:10 +0000 (09:05 +0000)]
PR modula2/102989: reimplement overflow detection in ztype though WIDE_INT_MAX_PRECISION
The ZTYPE in iso modula2 is used to denote intemediate ordinal type const
expressions and these are always converted into the
approriate language or user ordinal type prior to code generation.
The increase of bits supported by _BitInt causes the modula2 largeconst.mod
regression failure tests to pass. The largeconst.mod test has been
increased to fail, however the char at a time overflow check is now too slow
to detect failure. The overflow detection for the ZTYPE has been
rewritten to check against exceeding WIDE_INT_MAX_PRECISION (many orders of
magnitude faster).
gcc/m2/ChangeLog:
PR modula2/102989
* gm2-compiler/SymbolTable.mod (OverflowZType): Import from m2expr.
(ConstantStringExceedsZType): Remove import.
(GetConstLitType): Replace ConstantStringExceedsZType with OverflowZType.
* gm2-gcc/m2decl.cc (m2decl_ConstantStringExceedsZType): Remove.
(m2decl_BuildConstLiteralNumber): Re-write.
* gm2-gcc/m2decl.def (ConstantStringExceedsZType): Remove.
* gm2-gcc/m2decl.h (m2decl_ConstantStringExceedsZType): Remove.
* gm2-gcc/m2expr.cc (m2expr_StrToWideInt): Rewrite to check overflow.
(m2expr_OverflowZType): New function.
(ToWideInt): New function.
* gm2-gcc/m2expr.def (OverflowZType): New procedure function declaration.
* gm2-gcc/m2expr.h (m2expr_OverflowZType): New prototype.
gcc/testsuite/ChangeLog:
PR modula2/102989
* gm2/pim/fail/largeconst.mod: Updated foo to an outrageous value.
* gm2/pim/fail/largeconst2.mod: Duplicate test removed.
gcc/analyzer/ChangeLog:
* record-layout.cc: New file, based on material in region-model.cc.
* record-layout.h: Likewise.
* region-model.cc: Include "analyzer/record-layout.h".
(class record_layout): Move to record-layout.cc and .h
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
This patch eliminates the function "MACRO_MAP_EXPANSION_POINT_LOCATION"
(which hasn't been a macro since r6-739-g0501dbd932a7e9) in favor of
a new line_map_macro::get_expansion_point_location accessor.
No functional change intended.
gcc/c-family/ChangeLog:
* c-warn.cc (warn_for_multistatement_macros): Update for removal
of MACRO_MAP_EXPANSION_POINT_LOCATION.
gcc/cp/ChangeLog:
* module.cc (ordinary_loc_of): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
(module_state::note_location): Update for renaming of field.
(module_state::write_macro_maps): Likewise.
gcc/ChangeLog:
* input.cc (dump_location_info): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc):
Likewise.
libcpp/ChangeLog:
* include/line-map.h
(line_map_macro::get_expansion_point_location): New accessor.
(line_map_macro::expansion): Rename field to...
(line_map_macro::mexpansion): Rename field to...
(MACRO_MAP_EXPANSION_POINT_LOCATION): Delete this function.
* line-map.cc (linemap_enter_macro): Update for renaming of field.
(linemap_macro_map_loc_to_exp_point): Update for removal of
MACRO_MAP_EXPANSION_POINT_LOCATION.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Tue, 31 Oct 2023 21:05:41 +0000 (17:05 -0400)]
opts.cc: fix comment about DOCUMENTATION_ROOT_URL
gcc/ChangeLog:
* opts.cc (get_option_url): Update comment; the requirement to
pass DOCUMENTATION_ROOT_URL's value via -D was removed in r10-8065-ge33a1eae25b8a8.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Juzhe-Zhong [Thu, 26 Oct 2023 11:50:19 +0000 (19:50 +0800)]
VECT: Support SLP MASK_LEN_GATHER_LOAD with conditional mask
This patch leverage current MASK_GATHER_LOAD to support SLP MASK_LEN_GATHER_LOAD with condtional mask.
Unconditional MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) SLP is not included in this patch
since it seems that we can't support it in the middle-end:
FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in vectorizable_load, at tree-vect-stmts.cc:9885)
May be we should support GATHER_LOAD explictily in RISC-V backend to walk around this issue.
I am gonna support GATHER_LOAD explictly work around in RISC-V backend.
This patch also adds conditional gather load test since there is no conditional gather load test.
bpf: Improvements in CO-RE builtins implementation.
This patch moved the processing of attribute preserve_access_index to
its own independent pass in a gimple lowering pass.
This approach is more consistent with the implementation of the CO-RE
builtins when used explicitly in the code. The attributed type accesses
are now early converted to __builtin_core_reloc builtin instead of being
kept as an expression in code through out all of the middle-end.
This disables the compiler to optimize out or manipulate the expression
using the local defined type, instead of assuming nothing is known about
this expression, as it should be the case in all of the CO-RE
relocations.
In the process, also the __builtin_preserve_access_index has been
improved to generate code for more complex expressions that would
require more then one CO-RE relocation.
This turned out to be a requirement, since bpf-next selftests would rely on
loop unrolling in order to convert an undefined index array access into a
defined one. This seemed extreme to expect for the unroll to happen, and for
that reason GCC still generates correct code in such scenarios, even when index
access is never predictable or unrolling does not occur.
gcc/ChangeLog:
* config/bpf/bpf-passes.def (pass_lower_bpf_core): Added pass.
* config/bpf/bpf-protos.h: Added prototype for new pass.
* config/bpf/bpf.cc (bpf_delegitimize_address): New function.
* config/bpf/bpf.md (mov_reloc_core<MM:mode>): Prefixed
name with '*'.
* config/bpf/core-builtins.cc (cr_builtins) Added access_node to
struct.
(is_attr_preserve_access): Improved check.
(core_field_info): Make use of root_for_core_field_info
function.
(process_field_expr): Adapted to new functions.
(pack_type): Small improvement.
(bpf_handle_plugin_finish_type): Adapted to GTY(()).
(bpf_init_core_builtins): Changed to new function names.
(construct_builtin_core_reloc): Improved implementation.
(bpf_resolve_overloaded_core_builtin): Changed how
__builtin_preserve_access_index is converted.
(compute_field_expr): Corrected implementation. Added
access_node argument.
(bpf_core_get_index): Added valid argument.
(root_for_core_field_info, pack_field_expr)
(core_expr_with_field_expr_plus_base, make_core_safe_access_index)
(replace_core_access_index_comp_expr, maybe_get_base_for_field_expr)
(core_access_clean, core_is_access_index, core_mark_as_access_index)
(make_gimple_core_safe_access_index, execute_lower_bpf_core)
(make_pass_lower_bpf_core): Added functions.
(pass_data_lower_bpf_core): New pass struct.
(pass_lower_bpf_core): New gimple_opt_pass class.
(pack_field_expr_for_preserve_field)
(bpf_replace_core_move_operands): Removed function.
(bpf_enum_value_kind): Added GTY(()).
* config/bpf/core-builtins.h (bpf_field_info_kind, bpf_type_id_kind)
(bpf_type_info_kind, bpf_enum_value_kind): New enum.
* config/bpf/t-bpf: Added pass bpf-passes.def to PASSES_EXTRA.
gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-5.c: New test.
* gcc.target/bpf/core-attr-6.c: New test.
* gcc.target/bpf/core-builtin-1.c: Corrected
* gcc.target/bpf/core-builtin-enumvalue-opt.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-enumvalue.c: Corrected regular
expression.
* gcc.target/bpf/core-builtin-exprlist-1.c: New test.
* gcc.target/bpf/core-builtin-exprlist-2.c: New test.
* gcc.target/bpf/core-builtin-exprlist-3.c: New test.
* gcc.target/bpf/core-builtin-exprlist-4.c: New test.
* gcc.target/bpf/core-builtin-fieldinfo-offset-1.c: Extra tests
Neal Frager [Mon, 30 Oct 2023 17:02:53 +0000 (17:02 +0000)]
gcc: config: microblaze: fix cpu version check
The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options. By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.
gcc/ChangeLog:
* config/microblaze/microblaze.cc: Fix mcpu version check.
Patrick O'Neill [Mon, 30 Oct 2023 22:54:04 +0000 (15:54 -0700)]
RISC-V: Require a extension for testcases with atomic insns
Add testsuite infrastructure for the A extension and use it to require the A
extension for dg-do run and add the add extension for non-A dg-do compile.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/amo-table-a-6-amo-add-1.c: Add A extension to
dg-options for dg-do compile.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/inline-atomics-2.c: Ditto.
* gcc.target/riscv/inline-atomics-3.c: Require A extension for dg-do
run.
* gcc.target/riscv/inline-atomics-4.c: Ditto.
* gcc.target/riscv/inline-atomics-5.c: Ditto.
* gcc.target/riscv/inline-atomics-6.c: Ditto.
* gcc.target/riscv/inline-atomics-7.c: Ditto.
* gcc.target/riscv/inline-atomics-8.c: Ditto.
* lib/target-supports.exp: Add testing infrastructure to require the A
extension or add it to an existing -march.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
riscv: thead: Add support for the XTheadFMemIdx ISA extension
The XTheadFMemIdx ISA extension provides additional load and store
instructions for floating-point registers with new addressing modes.
The following memory accesses types are supported:
* load/store: [w,d] (single-precision FP, double-precision FP)
The following addressing modes are supported:
* register offset with additional immediate offset (4 instructions):
flr<type>, fsr<type>
* zero-extended register offset with additional immediate offset
(4 instructions): flur<type>, fsur<type>
These addressing modes are also part of the similar XTheadMemIdx
ISA extension support, whose code is reused and extended to support
floating-point registers.
One challenge that this patch needs to solve are GP registers in FP-mode
(e.g. "(reg:DF a2)"), which cannot be handled by the XTheadFMemIdx
instructions. Such registers are the result of independent
optimizations, which can happen after register allocation.
This patch uses a simple but efficient method to address this:
add a dependency for XTheadMemIdx to XTheadFMemIdx optimizations.
This allows to use the instructions from XTheadMemIdx in case
of such registers.
The added tests ensure that this feature won't regress without notice.
Testing: GCC regression test suite and SPEC CPU 2017 intrate (base&peak).
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadFMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadFMemIdx.
* config/riscv/riscv.h (HARDFP_REG_P): New macro.
* config/riscv/thead.cc (is_fmemidx_mode): New function.
(th_memidx_classify_address_index): Add support for XTheadFMemIdx.
(th_fmemidx_output_index): New function.
(th_output_move): Add support for XTheadFMemIdx.
* config/riscv/thead.md (TH_M_ANYF): New mode iterator.
(TH_M_NOEXTF): Likewise.
(*th_fmemidx_movsf_hardfloat): New INSN.
(*th_fmemidx_movdf_hardfloat_rv64): Likewise.
(*th_fmemidx_I_a): Likewise.
(*th_fmemidx_I_c): Likewise.
(*th_fmemidx_US_a): Likewise.
(*th_fmemidx_US_c): Likewise.
(*th_fmemidx_UZ_a): Likewise.
(*th_fmemidx_UZ_c): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadfmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-index.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadfmemidx-uindex.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
riscv: thead: Add support for the XTheadMemIdx ISA extension
The XTheadMemIdx ISA extension provides a additional load and store
instructions with new addressing modes.
The following memory accesses types are supported:
* load: b,bu,h,hu,w,wu,d
* store: b,h,w,d
The following addressing modes are supported:
* immediate offset with PRE_MODIFY or POST_MODIFY (22 instructions):
l<ltype>.ia, l<ltype>.ib, s<stype>.ia, s<stype>.ib
* register offset with additional immediate offset (11 instructions):
lr<ltype>, sr<stype>
* zero-extended register offset with additional immediate offset
(11 instructions): lur<ltype>, sur<stype>
The RISC-V base ISA does not support index registers, so the changes
are kept separate from the RISC-V standard support as much as possible.
To combine the shift/multiply instructions into the memory access
instructions, this patch comes with a few insn_and_split optimizations
that allow the combiner to do this task.
Handling the different cases of extensions results in a couple of INSNs
that look redundant on first view, but they are just the equivalence
of what we already have for Zbb as well. The only difference is, that
we have much more load instructions.
We already have a constraint with the name 'th_f_fmv', therefore,
the new constraints follow this pattern and have the same length
as required ('th_m_mia', 'th_m_mib', 'th_m_mir', 'th_m_miu').
The added tests ensure that this feature won't regress without notice.
Testing: GCC regression test suite, GCC bootstrap build, and
SPEC CPU 2017 intrate (base&peak) on C920.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/constraints.md (th_m_mia): New constraint.
(th_m_mib): Likewise.
(th_m_mir): Likewise.
(th_m_miu): Likewise.
* config/riscv/riscv-protos.h (enum riscv_address_type):
Add new address types ADDRESS_REG_REG, ADDRESS_REG_UREG,
and ADDRESS_REG_WB and their documentation.
(struct riscv_address_info): Add new field 'shift' and
document the field usage for the new address types.
(riscv_valid_base_register_p): New prototype.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/riscv.cc (riscv_index_reg_class):
Return GR_REGS for XTheadMemIdx.
(riscv_regno_ok_for_index_p): Add support for XTheadMemIdx.
(riscv_classify_address): Call th_classify_address() on top.
(riscv_output_move): Call th_output_move() on top.
(riscv_print_operand_address): Call th_print_operand_address()
on top.
* config/riscv/riscv.h (HAVE_POST_MODIFY_DISP): New macro.
(HAVE_PRE_MODIFY_DISP): Likewise.
* config/riscv/riscv.md (zero_extendqi<SUPERQI:mode>2): Disable
for XTheadMemIdx.
(*zero_extendqi<SUPERQI:mode>2_internal): Convert to expand,
create INSN with same name and disable it for XTheadMemIdx.
(extendsidi2): Likewise.
(*extendsidi2_internal): Disable for XTheadMemIdx.
* config/riscv/thead.cc (valid_signed_immediate): New helper
function.
(th_memidx_classify_address_modify): New function.
(th_memidx_legitimate_modify_p): Likewise.
(th_memidx_output_modify): Likewise.
(is_memidx_mode): Likewise.
(th_memidx_classify_address_index): Likewise.
(th_memidx_legitimate_index_p): Likewise.
(th_memidx_output_index): Likewise.
(th_classify_address): Likewise.
(th_output_move): Likewise.
(th_print_operand_address): Likewise.
* config/riscv/thead.md (*th_memidx_operand): New splitter.
(*th_memidx_zero_extendqi<SUPERQI:mode>2): New INSN.
(*th_memidx_extendsidi2): Likewise.
(*th_memidx_zero_extendsidi2): Likewise.
(*th_memidx_zero_extendhi<GPR:mode>2): Likewise.
(*th_memidx_extend<SHORT:mode><SUPERQI:mode>2): Likewise.
(*th_memidx_bb_zero_extendsidi2): Likewise.
(*th_memidx_bb_zero_extendhi<GPR:mode>2): Likewise.
(*th_memidx_bb_extendhi<GPR:mode>2): Likewise.
(*th_memidx_bb_extendqi<SUPERQI:mode>2): Likewise.
(TH_M_ANYI): New mode iterator.
(TH_M_NOEXTI): Likewise.
(*th_memidx_I_a): New combiner optimization.
(*th_memidx_I_b): Likewise.
(*th_memidx_I_c): Likewise.
(*th_memidx_US_a): Likewise.
(*th_memidx_US_b): Likewise.
(*th_memidx_US_c): Likewise.
(*th_memidx_UZ_a): Likewise.
(*th_memidx_UZ_b): Likewise.
(*th_memidx_UZ_c): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadmemidx-helpers.h: New test.
* gcc.target/riscv/xtheadmemidx-index-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-index-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-index.c: New test.
* gcc.target/riscv/xtheadmemidx-modify-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-modify.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-update.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-xtheadbb-update.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex-xtheadbb.c: New test.
* gcc.target/riscv/xtheadmemidx-uindex.c: New test.
Currently we have the documentation for __builtin_vec_bcdsub_{eq,gt,lt} but
not for __builtin_bcdsub_{gl}e, this patch is to supplement the descriptions
for them. Although they are mainly for __builtin_bcdcmp{ge,le}, we already
have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this patch
adds the corresponding explicit test cases as well.
gcc/ChangeLog:
* doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add
documentation for the builti-ins.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/bcd-3.c (do_sub_ge, do_suble): Add functions
to test builtins __builtin_bcdsub_ge and __builtin_bcdsub_le.
Neal Frager [Mon, 30 Oct 2023 17:02:53 +0000 (17:02 +0000)]
gcc: config: microblaze: fix cpu version check
The MICROBLAZE_VERSION_COMPARE was incorrectly using strcasecmp
instead of strverscmp to check the mcpu version against feature
options. By simply changing the define to use strverscmp,
the new version 10.0 is treated correctly as a higher version
than previous versions.
Fix incorrect warning with -mcpu=10.0:
warning: '-mxl-multiply-high' can be used only with
'-mcpu=v6.00.a' or greater
Signed-off-by: Neal Frager <neal.frager@amd.com> Signed-off-by: Michael J. Eager <eager@eagercon.com>
[RA]: Fixing LRA cycling for multi-reg variable containing a fixed reg
PR111971 test case uses a multi-reg variable containing a fixed reg. LRA
rejects such multi-reg because of this when matching the constraint for
an asm insn. The rejection results in LRA cycling. The patch fixes this issue.
gcc/ChangeLog:
PR rtl-optimization/111971
* lra-constraints.cc: (process_alt_operands): Don't check start
hard regs for regs originated from register variables.
gcc/testsuite/ChangeLog:
PR rtl-optimization/111971
* gcc.target/powerpc/pr111971.c: New test.
Robin Dapp [Fri, 27 Oct 2023 11:58:05 +0000 (13:58 +0200)]
RISC-V: Add vector fmin/fmax expanders.
This patch adds expanders for fmin and fmax. As per RISC-V V Spec 1.0
vfmin/vfmax are IEEE 754-2019 compliant which differs from IEEE 754-2008
that fmin/fmax require (particularly in the signaling-NaN handling).
Therefore the pattern conditions include a !HONOR_SNANS.
Robin Dapp [Thu, 12 Oct 2023 09:23:26 +0000 (11:23 +0200)]
genemit: Split insn-emit.cc into several partitions.
On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create several partitions
(insn-emit-1.cc to insn-emit-n.cc). The available patterns are
written to the given files in a sequential fashion.
Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.
Alexandre Oliva [Tue, 31 Oct 2023 12:32:08 +0000 (09:32 -0300)]
hardcfr: support checking at abnormal edges [PR111943]
Control flow redundancy may choose abnormal edges for early checking,
but that breaks because we can't insert checks on such edges.
Introduce conditional checking on the dest block of abnormal edges,
and leave it for the optimizer to drop the conditional.
for gcc/ChangeLog
PR tree-optimization/111943
* gimple-harden-control-flow.cc: Adjust copyright year.
(rt_bb_visited): Add vfalse and vtrue data members.
Zero-initialize them in the ctor.
(rt_bb_visited::insert_exit_check_on_edge): Upon encountering
abnormal edges, insert initializers for vfalse and vtrue on
entry, and insert the check sequence guarded by a conditional
in the dest block.
Richard Biener [Tue, 31 Oct 2023 09:13:13 +0000 (10:13 +0100)]
tree-optimization/112305 - SCEV cprop and conditional undefined overflow
The following adjusts final value replacement to also rewrite the
replacement to defined overflow behavior if there's conditionally
evaluated stmts (with possibly undefined overflow), not only when
we "folded casts". The patch hooks into expression_expensive for
this.
PR tree-optimization/112305
* tree-scalar-evolution.h (expression_expensive): Adjust.
* tree-scalar-evolution.cc (expression_expensive): Record
when we see a COND_EXPR.
(final_value_replacement_loop): When the replacement contains
a COND_EXPR, rewrite it to defined overflow.
* tree-ssa-loop-ivopts.cc (may_eliminate_iv): Adjust.
Iain Buclaw [Tue, 31 Oct 2023 11:20:02 +0000 (12:20 +0100)]
d: Clean-up unused variable assignments after interface change
The lowering done for invoking `new' on a single dimension array was
moved from the code generator to the front-end semantic pass in
r14-4996. This removes the detritus left behind in the code generator
from that deletion.
After this patch:
...
_30 = .COND_LEN_DIV (mask__31.16_61, vect__5.19_65, vect__7.22_69, vect_iftmp.27_77, _85, 0);
...
gcc/ChangeLog:
* gimple-match.h (gimple_match_op::gimple_match_op):
Add interfaces for more arguments.
(gimple_match_op::set_op): Add interfaces for more arguments.
* match.pd: Add support of combining cond_len_op + vec_cond
Haochen Jiang [Tue, 31 Oct 2023 05:33:49 +0000 (13:33 +0800)]
Fix incorrect option mask and avx512cd target push
gcc/ChangeLog:
* config/i386/avx512cdintrin.h (target): Push evex512 for
avx512cd.
* config/i386/avx512vlintrin.h (target): Split avx512cdvl part
out from avx512vl.
* config/i386/i386-builtin.def (BDESC): Do not check evex512
for builtins not needed.
Lehua Ding [Tue, 31 Oct 2023 03:18:28 +0000 (11:18 +0800)]
RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond
Hi,
This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.
Consider this code:
void
foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16 *__restrict b,
int64_t *__restrict pred, int n)
{
for (int i = 0; i < n; i += 1)
{
r[i] = pred[i] ? (_Float16) a[i] : b[i];
}
}
Before this patch:
...
vfncvt.f.f.w v2,v2
vmerge.vvm v1,v1,v2,v0
vse16.v v1,0(a0)
...
After this patch:
...
vfncvt.f.f.w v1,v2,v0.t
vse16.v v1,0(a0)
...
gcc/ChangeLog:
* config/riscv/autovec.md (<float_cvt><mode><vnnconvert>2):
Change to define_expand.
define_split doesn't work since pass_combine assume it produces at
most 2 insns after split, but here it produces 3 since we need to move
const0_rtx (V2HImode) to reg. The move insn can be eliminated later.
gcc/ChangeLog:
PR target/112276
* config/i386/mmx.md (*mmx_pblendvb_v8qi_1): Change
define_split to define_insn_and_split to handle
immediate_operand for comparison.
(*mmx_pblendvb_v8qi_2): Ditto.
(*mmx_pblendvb_<mode>_1): Ditto.
(*mmx_pblendvb_v4qi_2): Ditto.
(<code><mode>3): Remove define_split after it.
(<code>v8qi3): Ditto.
(<code><mode>3): Ditto.
(<ode>v2hi3): Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/part-vect-vcondhf.C: Adjust testcase.
* gcc.target/i386/pr112276.c: New test.
Andrew Pinski [Sat, 28 Oct 2023 02:23:52 +0000 (19:23 -0700)]
MATCH: Add some more value_replacement simplifications to match
This moves a few more value_replacements simplifications to match.
/* a == 1 ? b : a * b -> a * b */
/* a == 1 ? b : b / a -> b / a */
/* a == -1 ? b : a & b -> a & b */
Also adds a testcase to show can we catch these where value_replacement would not
(but other passes would).
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (`a == 1 ? b : a OP b`): New pattern.
(`a == -1 ? b : a & b`): New pattern.
Andrew Pinski [Thu, 26 Oct 2023 22:07:53 +0000 (15:07 -0700)]
MATCH: first of the value replacement moving from phiopt
This moves a few simple patterns that are done in value replacement
in phiopt over to match.pd. Just the simple ones which might show up
in other code.
This allows some optimizations to happen even without depending
on sinking from happening and in some cases where phiopt is not
invoked (cond-1.c is an example there).
Changes since v1:
* v2: Add an extra testcase to showcase improvements at -O1.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd: (`a == 0 ? b : b + a`,
`a == 0 ? b : b - a`): New patterns.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/cond-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-1a.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-2.c: New test.
Mayshao [Mon, 30 Oct 2023 21:19:12 +0000 (22:19 +0100)]
i386: Zhaoxin yongfeng enablement
Enable -march/-mtune=yongfeng. Costs and tunings are set according
to the characteristics of the processor. Add a new .md file to describe
yongfeng processor.
Martin Jambor [Mon, 30 Oct 2023 17:34:59 +0000 (18:34 +0100)]
ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)
PR 111157 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.
This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed." Unfortunately, doing so is tricky. First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary. Looking stuff up there means doing what most of what
modref_may_alias may do but doing exactly what it does is tricky
because it takes also aliasing into account and has bail-out counters.
To err on the side of caution in order to avoid this miscompilation we
have to prune a constant when in doubt. However, pruning can
interfere with the mechanism of how clone materialization
distinguishes between the cases when a parameter was entirely removed
and when it was both IPA-CPed and IPA-SRAed (in order to make up for
the removal in debug info, which can bump into an assert when
compiling g++.dg/torture/pr103669.C when we are not careful).
Therefore this patch:
1) marks constants that IPA-modref has in its kill list with a new
"killed" flag, and
2) prunes the list from entries with this flag after materialization
and IPA-CP transformation is done using the template introduced in
the previous patch
It does not try to look up anything in the load lists, this will be
done as a follow-up in order to ease review.
gcc/ChangeLog:
2023-10-27 Martin Jambor <mjambor@suse.cz>
PR ipa/111157
* ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
* ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
(update_signature): Mark any any IPA-CP aggregate constants at
positions known to be killed as killed. Move check that there is
clone_info after this pruning.
* ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
(ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
(push_agg_values_from_plats): Likewise.
(ipa_push_agg_values_from_jfunc): Likewise.
(estimate_local_effects): Likewise.
(push_agg_values_for_index_from_edge): Likewise.
* ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
flag.
(read_ipcp_transformation_info): Likewise.
(ipcp_get_aggregate_const): Update comment, assert that encountered
record does not have killed flag set.
(ipcp_transform_function): Prune all aggregate constants with killed
set.
gcc/testsuite/ChangeLog:
2023-09-18 Martin Jambor <mjambor@suse.cz>
PR ipa/111157
* gcc.dg/lto/pr111157_0.c: New test.
* gcc.dg/lto/pr111157_1.c: Second file of the same new test.
Martin Jambor [Mon, 30 Oct 2023 17:34:59 +0000 (18:34 +0100)]
ipa-cp: Templatize filtering of m_agg_values
PR 111157 points to another place where IPA-CP collected aggregate
compile-time constants need to be filtered, in addition to the one
place that already does this in ipa-sra. In order to re-use code,
this patch turns the common bit into a template.
The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.
gcc/ChangeLog:
2023-09-13 Martin Jambor <mjambor@suse.cz>
PR ipa/111157
* ipa-prop.h (ipcp_transformation): New member function template
remove_argaggs_if.
* ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
filter aggreagate constants.
Patrick O'Neill [Mon, 30 Oct 2023 16:30:01 +0000 (09:30 -0700)]
RISC-V: Make rv32i_zcmp testcase more robust
GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Roger Sayle [Mon, 30 Oct 2023 16:21:28 +0000 (16:21 +0000)]
ARC: Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.
This patch optimizes PR middle-end/101955 for the ARC backend. On ARC
CPUs with a barrel shifter, using two shifts is optimal as:
asl_s r0,r0,31
asr_s r0,r0,31
but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
and r2,r0,1
ror r2,r2
add.f 0,r2,r2
sbc r0,r0,r0
with this patch, we now generate the smaller, faster and non-flags
clobbering:
bmsk_s r0,r0,0
neg_s r0,r0
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/101955
* config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
to convert sign extract of the least significant bit into an
AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
gcc/testsuite/ChangeLog
PR middle-end/101955
* gcc.target/arc/pr101955.c: New test case.
Roger Sayle [Mon, 30 Oct 2023 16:17:42 +0000 (16:17 +0000)]
ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.
This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc. The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.
The motivating example is derived from PR rtl-optimization/110717.
struct S { int a : 5; };
unsigned int foo (struct S *p) {
return p->a;
}
With a barrel shifter, GCC -O2 generates the reasonable:
Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops. This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.
Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
foo: ldb_s r0,[r0]
mov lp_count,27
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
mov lp_count,27
lp 2f
asr r0,r0
nop
2: # end single insn loop
j_s [blink]
which contains two loops and requires about ~113 cycles to execute.
With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
which requires only ~6 cycles, for the shorter shifts by 3 and sign
extension.
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
Provide reasonable values for SHIFTS and ROTATES by constant
bit counts depending upon TARGET_BARREL_SHIFTER.
(arc_insn_cost): Use insn attributes if the instruction is
recognized. Avoid calling get_attr_length for type "multi",
i.e. define_insn_and_split patterns without explicit type.
Fall-back to set_rtx_cost for single_set and pattern_cost
otherwise.
* config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
(BRANCH_COST): Improve/correct definition.
(LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
Roger Sayle [Mon, 30 Oct 2023 16:12:30 +0000 (16:12 +0000)]
ARC: Improved SImode shifts and rotates with -mswap.
This patch improves the code generated by the ARC back-end for CPUs
without a barrel shifter but with -mswap. The -mswap option provides
a SWAP instruction that implements SImode rotations by 16, but also
logical shift instructions (left and right) by 16 bits. Clearly these
are also useful building blocks for implementing shifts by 17, 18, etc.
which would otherwise require a loop.
As a representative example:
int shl20 (int x) { return x << 20; }
GCC with -O2 -mcpu=em -mswap would previously generate:
shl20: mov lp_count,10
lp 2f
add r0,r0,r0
add r0,r0,r0
2: # end single insn loop
j_s [blink]
Although both are four instructions (excluding the j_s),
the original takes ~22 cycles, and replacement ~4 cycles.
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_split_ashl): Use lsl16 on TARGET_SWAP.
(arc_split_ashr): Use swap and sign-extend on TARGET_SWAP.
(arc_split_lshr): Use lsr16 on TARGET_SWAP.
(arc_split_rotl): Use swap on TARGET_SWAP.
(arc_split_rotr): Likewise.
* config/arc/arc.md (ANY_ROTATE): New code iterator.
(<ANY_ROTATE>si2_cnt16): New define_insn for alternate form of
swap instruction on TARGET_SWAP.
(ashlsi2_cnt16): Rename from *ashlsi16_cnt16 and move earlier.
(lshrsi2_cnt16): New define_insn for LSR16 instruction.
(*ashlsi2_cnt16): See above.
gcc/testsuite/ChangeLog
* gcc.target/arc/lsl16-1.c: New test case.
* gcc.target/arc/lsr16-1.c: Likewise.
* gcc.target/arc/swap-1.c: Likewise.
* gcc.target/arc/swap-2.c: Likewise.
Richard Ball [Mon, 30 Oct 2023 15:31:26 +0000 (15:31 +0000)]
arm: move the switch tables for Arm to the RO data section.
Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.
gcc/ChangeLog:
* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
from (!target_pure_code) condition.
(ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
.Lrtx label and remove adr instructions.
* config/arm/arm.md
(arm_casesi_internal): Use force_reg to generate ldr instructions that
would otherwise be out of range, and change rtl to accommodate force reg.
Additionally remove unnecessary register temp.
(casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
targets from JUMP_TABLES_IN_TEXT_SECTION definition.
gcc/testsuite/ChangeLog:
* gcc.target/arm/arm-switchstatement.c: Alter the tests to
change adr instruction to ldr.
Jeevitha [Mon, 30 Oct 2023 09:07:07 +0000 (04:07 -0500)]
rs6000: Change bitwise xor to an equality operator [PR106907]
PR106907 has a few warnings spotted from cppcheck. These warnings
are related to the need of precedence clarification. Instead of using xor,
it has been changed to equality check, which achieves the same result.
Additionally, comment indentation has been fixed.
Juzhe-Zhong [Sat, 28 Oct 2023 02:05:07 +0000 (10:05 +0800)]
RISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32
sew64_scalar_helper is handling SEW64 vx instruction pattern on RV32 system.
According to RVV ISA, we can directly use vx instruction of SEW64 on RV32 system
since RV32 GR reg is 32bit.
The root cause of this is because we missed VLMAX handling since the codes was invented
long time ago (Callers always intrinsics codes, no VLMAX situation).
Now, all following bugs are fixed after this patch:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
Paul Thomas [Mon, 30 Oct 2023 07:12:40 +0000 (07:12 +0000)]
Fortran: Fix a problem with SELECT TYPE selectors [PR104555].
2023-10-30 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/104555
* resolve.cc (resolve_select_type): If the selector expression
has no class component references and the expression is a
derived type, copy the typespec of the symbol to that of the
expression.
gcc/testsuite/
PR fortran/104555
* gfortran.dg/pr104555.f90: New test.