Pan Li [Mon, 12 Jun 2023 12:07:24 +0000 (20:07 +0800)]
RISC-V: Fix one potential test failure for RVV vsetvl
The test will fail on below command with multi-thread like below. However,
it comes from one missed "Oz" option when check vsetvl.
make -j $(nproc) report RUNTESTFLAGS="rvv.exp riscv.exp"
To some reason, this failure cannot be reproduced by RUNTESTFLAGS="rvv.exp"
or make without -j option. We would like to fix it and root cause the
reason later.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust test checking.
Pan Li [Mon, 12 Jun 2023 07:16:21 +0000 (15:16 +0800)]
RISC-V: Support RVV FP16 MISC vget/vset intrinsic API
This patch support the intrinsic API of FP16 ZVFHMIN vget/vset. From
the user's perspective, it is reasonable to do some get/set operations
for the vfloat16*_t types when only ZVFHMIN is enabled.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat16m1_t): Add type to lmul1 ops.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Likewise.
Richard Biener [Mon, 12 Jun 2023 12:09:45 +0000 (14:09 +0200)]
Fix disambiguation against .MASK_STORE
Alias analysis was treating .MASK_STORE as storing a full vector
which means we disambiguate against decls of smaller than vector size.
That's of course wrong and a similar issue was fixed for DSE already.
The following makes sure we set the size of the access to unknown
and only constrain max_size.
This fixes runtime execution FAILs of gfortran.dg/matmul_2.f90,
gfortran.dg/matmul_6.f90 and gfortran.dg/pr91577.f90 when using
AVX512 with full masked loop vectorization on Zen4.
* tree-ssa-alias.cc (call_may_clobber_ref_p_1): For
.MASK_STORE and friend set the size of the access to
unknown.
Juzhe-Zhong [Mon, 12 Jun 2023 02:41:02 +0000 (10:41 +0800)]
RISC-V: Add RVV narrow shift right lowering auto-vectorization
Optimize the following auto-vectorization codes:
void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
{
for (int i = 0; i < n; i++)
a[i] = b[i] >> c;
}
Before this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a4,zero,e32,m1,ta,ma
vsra.vx v1,v1,a2
vsetvli zero,zero,e16,mf2,ta,ma
slli a7,a5,2
vncvt.x.x.w v1,v1
slli a6,a5,1
vsetvli zero,a5,e16,mf2,ta,ma
sub a3,a3,a5
vse16.v v1,0(a0)
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret
After this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a7,zero,e16,mf2,ta,ma
slli a6,a5,2
vnsra.wx v1,v1,a2
slli a4,a5,1
vsetvli zero,a5,e16,mf2,ta,ma
sub a3,a3,a5
vse16.v v1,0(a0)
add a1,a1,a6
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
gcc/ChangeLog:
* config/riscv/autovec-opt.md
(*v<any_shiftrt:optab><any_extend:optab>trunc<mode>): New pattern.
(*<any_shiftrt:optab>trunc<mode>): Ditto.
* config/riscv/autovec.md (<optab><mode>3): Change to
define_insn_and_split.
(v<optab><mode>3): Ditto.
(trunc<mode><v_double_trunc>2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
Kyrylo Tkachov [Mon, 12 Jun 2023 10:42:29 +0000 (11:42 +0100)]
simplify-rtx: Implement constant folding of SS_TRUNCATE, US_TRUNCATE
This patch implements RTL constant-folding for the SS_TRUNCATE and US_TRUNCATE codes.
The semantics are a clamping operation on the argument with the min and max of the narrow mode,
followed by a truncation. The signedness of the clamp and the min/max extrema is derived from
the signedness of the saturating operation.
We have a number of instructions in aarch64 that use SS_TRUNCATE and US_TRUNCATE to represent
their operations and we have pretty thorough runtime tests in gcc.target/aarch64/advsimd-intrinsics/vqmovn*.c.
With this patch the instructions are folded away at optimisation levels and the correctness checks still
pass.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Andre Vieira [Mon, 12 Jun 2023 09:30:39 +0000 (10:30 +0100)]
vect: Don't pass subtype to vect_widened_op_tree where not needed [PR 110142]
This patch fixes an issue introduced by
g:2f482a07365d9f4a94a56edd13b7f01b8f78b5a0, where a subtype was beeing passed
to vect_widened_op_tree, when no subtype was to be used. This lead to an
errorneous use of IFN_VEC_WIDEN_MINUS.
gcc/ChangeLog:
PR middle-end/110142
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Don't pass
subtype to vect_widened_op_tree and remove subtype parameter, also
remove superfluous overloaded function definition.
(vect_recog_widen_plus_pattern): Remove subtype parameter and dont pass
to call to vect_recog_widen_op_pattern.
(vect_recog_widen_minus_pattern): Likewise.
liuhongt [Fri, 2 Jun 2023 04:38:17 +0000 (12:38 +0800)]
Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion.
This patch only support optabs for vector modes whose lenth >= 128.
For 32/64-bit vector, they're more hanlded by BB vectorizer with
truncmn2/extendmn2/fix{,uns}_truncmn2.
gcc/ChangeLog:
* config/i386/sse.md (vec_pack<floatprefix>_float_<mode>): New expander.
(vec_unpack_<fixprefix>fix_trunc_lo_<mode>): Ditto.
(vec_unpack_<fixprefix>fix_trunc_hi_<mode>): Ditto.
(vec_unpacks_lo_<mode>): Ditto.
(vec_unpacks_hi_<mode>): Ditto.
(sse_movlhps_<mode>): New define_insn.
(ssse3_palignr<mode>_perm): Extend to V_128H.
(V_128H): New mode iterator.
(ssepackPHmode): New mode attribute.
(vunpck_extract_mode): Ditto.
(vpckfloat_concat_mode): Extend to VxSI/VxSF for _Float16.
(vpckfloat_temp_mode): Ditto.
(vpckfloat_op_mode): Ditto.
(vunpckfixt_mode): Extend to VxHF.
(vunpckfixt_model): Ditto.
(vunpckfixt_extract_mode): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/vec_pack_fp16-1.c: New test.
* gcc.target/i386/vec_pack_fp16-2.c: New test.
* gcc.target/i386/vec_pack_fp16-3.c: New test.
Jason Merrill [Tue, 6 Dec 2022 23:10:48 +0000 (18:10 -0500)]
c++: build initializer_list<string> in a loop [PR105838]
I previously applied this change in r13-4565 but reverted it due to
PR108071. That PR was then fixed by r13-4712, but I didn't re-apply this
change then because we weren't making the array static; since r14-1500 for
PR110070 we now make the initializer array static, so let's bring this back.
In situations where the maybe_init_list_as_range optimization isn't viable,
we can build an initializer_list<string> with a loop over a constant array
of string literals.
This is represented using a VEC_INIT_EXPR, which required adjusting a couple
of places that expected the initializer array to have the same type as the
target array and fixing build_vec_init not to undo our efforts.
PR c++/105838
gcc/cp/ChangeLog:
* call.cc (convert_like_internal) [ck_list]: Use
maybe_init_list_as_array.
* constexpr.cc (cxx_eval_vec_init_1): Init might have
a different type.
* tree.cc (build_vec_init_elt): Likewise.
* init.cc (build_vec_init): Handle from_array from a
TARGET_EXPR. Retain TARGET_EXPR of a different type.
Kewen Lin [Mon, 12 Jun 2023 06:08:22 +0000 (01:08 -0500)]
rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932]
As PR109932 shows, builtins __builtin_{un,}pack_vector_int128
should be guarded under vsx rather than power7, as their
corresponding bif patterns have the conditions TARGET_VSX
and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode). This patch is to
move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure
their supports.
PR target/109932
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128,
__builtin_unpack_vector_int128): Move from stanza power7 to vsx.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr109932-1.c: New test.
* gcc.target/powerpc/pr109932-2.c: New test.
Kewen Lin [Mon, 12 Jun 2023 06:07:52 +0000 (01:07 -0500)]
rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011]
As PR110011 shows, when encoding 128 bits fp constant into
toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is
to find the first float mode with LONG_DOUBLE_TYPE_SIZE
bits of precision, it would be TFmode here. But the 128
bits fp constant can be with mode IFmode or KFmode, which
doesn't necessarily have the same underlying float format
as the one of TFmode, like this PR exposes, with option
-mabi=ibmlongdouble TFmode has ibm_extended_format while
KFmode has ieee_quad_format, mixing up the formats (the
encoding/decoding ways) would cause unexpected results.
This patch is to make it use constant's own mode instead
of TFmode for real_to_target call.
PR target/110011
gcc/ChangeLog:
* config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit
floating constant itself for real_to_target call.
From the user's perspective, it is reasonable to do above operation
when only ZVFHMIN is enabled. This patch would like to add new test
cases to make sure the RVV FP16 vreinterpret works well as expected.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Ditto.
David Edelsohn [Sun, 4 Jun 2023 00:27:16 +0000 (20:27 -0400)]
aix: Debugging does not require a stack frame.
The rs6000 port has allocated a stack frame when debugging is enabled
on AIX since the earliest versions of the port. Apparently the
earliest versions of the debuggers for AIX had difficulty with stackless
frames.
Both AIX DBX and GDB support stackless frames on AIX, and IBM XLC,
OpenXL and LLVM for AIX do not generate an extraneous stack frame when
debugging is enabled. This patch updates the rs6000 stack info function
to not set the stack frame flag when debugging is enabled for AIX.
gcc/ChangeLog:
* config/rs6000/rs6000-logue.cc (rs6000_stack_info):
Do not require a stack frame when debugging is enabled for AIX.
In this other testcase from PR110122, during regeneration of the generic
lambda with V=Bar{}, substitution followed by coerce_template_parms for
A<V>'s template argument naturally yields a copy of V in terms of Bar's
(implicitly) defaulted copy constructor.
This however happens inside a template context so although we introduced
a use of the copy constructor, mark_used didn't actually synthesize it,
which causes subsequent constant evaluation of the template argument to
fail with:
nontype-class59.C: In instantiation of ‘void f() [with Bar V = Bar{Foo()}]’:
nontype-class59.C:22:11: required from here
nontype-class59.C:18:18: error: ‘constexpr Bar::Bar(const Bar&)’ used before its definition
We already make sure to instantiate templated constexpr functions needed
for constant evaluation (as per P0859R0). So this patch fixes this by
making us synthesize defaulted constexpr functions needed for constant
evaluation as well.
Here when substituting the injected class name A during regeneration of
the lambda, we find ourselves in lookup_template_class for A<V> with
V=_ZTAXtl3BarEE (i.e. the template parameter object for Foo{}). The call
to coerce_template_parms within then undesirably tries to make a copy of
this class NTTP argument, which fails because Foo is not copyable. But it
seems clear that this testcase shouldn't require copyability of Foo.
lookup_template_class has a shortcut for looking up the current class
scope, which would avoid the problematic coerce_template_parms call, but
the shortcut doesn't trigger because it only considers the innermost
class scope which in this case in the lambda type. So this patch fixes
this by extending the lookup_template_class shortcut to consider outer
class scopes too (and skipping over lambda types since they are never
specialized from lookup_template_class). We also need to avoid calling
coerce_template_parms when specializing a templated non-template nested
class for the first time (such as A::B in the testcase). Coercion should
be unnecessary there because the innermost arguments belong to the context
and so should have already been coerced.
PR c++/110122
gcc/cp/ChangeLog:
* pt.cc (lookup_template_class): Extend shortcut for looking up the
current class scope to consider outer class scopes too, and use
current_nonlambda_class_type instead of current_class_type. Only
call coerce_template_parms when specializing a primary template.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/nontype-class57.C: New test.
* g++.dg/cpp2a/nontype-class58.C: New test.
Georg-Johann Lay [Sun, 11 Jun 2023 11:54:14 +0000 (13:54 +0200)]
Use canonical form for reversed single-bit insertions after reload.
We now split almost all insns after reload in order to add clobber of REG_CC.
If insns are coming from insn combiner and there is no canonical form for
the respective arithmetic (like for reversed bit insertions), there is
no need to keep all these different representations after reload:
Instead of splitting such patterns to their clobber-REG_CC-analogon, we can
split to a canonical representation, which is insv_notbit for the present case.
This is a no-op change.
Georg-Johann Lay [Sat, 10 Jun 2023 21:21:13 +0000 (23:21 +0200)]
target/19907: Overhaul bit extractions.
o Logical right shift that shifts the MSB to position 0 can be performed in
such a way that the input operand constraint can be relaxed from "0" to "r".
This results in less register pressure. Moreover, no scratch register is
required in that case.
o The deprecated "extzv" pattern is replaced by "extzv<mode>" that allows
inputs of scalar integer modes of different sizes (1 up to 4 bytes).
o Existing patterns are adjusted to the more generic "extzv<mode>" pattern.
Some patterns are added as the middle-end has been reworked to spot
more bit-extraction opportunities.
o A C function is used to print the asm for bit extractions, which is more
convenient for complex output logic.
The generated code is still not optimal because RTL optimizers might still
prefer arithmetic like shift over bit-extractions. For test cases see
also PR36884 and PR55181.
gcc/
PR target/109907
* config/avr/avr.md (adjust_len) [extr, extr_not]: New elements.
(MSB, SIZE): New mode attributes.
(any_shift): New code iterator.
(*lshr<mode>3_split, *lshr<mode>3, lshr<mode>3)
(*lshr<mode>3_const_split): Add constraint alternative for
the case of shift-offset = MSB. Ditch "length" attribute.
(extzv<mode): New. replaces extzv. Adjust following patterns.
Use avr_out_extr, avr_out_extr_not to print asm.
(*extzv.subreg.<mode>, *extzv.<mode>.subreg, *extzv.xor)
(*extzv<mode>.ge, *neg.ashiftrt<mode>.msb, *extzv.io.lsr7): New.
* config/avr/constraints.md (C15, C23, C31, Yil): New
* config/avr/predicates.md (reg_or_low_io_operand)
(const7_operand, reg_or_low_io_operand)
(const15_operand, const_0_to_15_operand)
(const23_operand, const_0_to_23_operand)
(const31_operand, const_0_to_31_operand): New.
* config/avr/avr-protos.h (avr_out_extr, avr_out_extr_not): New.
* config/avr/avr.cc (avr_out_extr, avr_out_extr_not): New funcs.
(lshrqi3_out, lshrhi3_out, lshrpsi3_out, lshrsi3_out): Adjust
MSB case to new insn constraint "r" for operands[1].
(avr_adjust_insn_length) [ADJUST_LEN_EXTR_NOT, ADJUST_LEN_EXTR]:
Handle these cases.
(avr_rtx_costs_1): Adjust cost for a new pattern.
gcc/testsuite/
PR target/109907
* gcc.target/avr/pr109907.c: New test.
* gcc.target/avr/torture/pr109907-1.c: New test.
* gcc.target/avr/torture/pr109907-2.c: New test.
Juzhe-Zhong [Fri, 9 Jun 2023 23:11:43 +0000 (07:11 +0800)]
RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS
Address comments from Jeff.
This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6
are quite messy and cause some bugs discovered by my downstream auto-vectorization
test-generator.
Before this patch.
Phase 5 is cleanup_insns is the function remove AVL operand dependency from each RVV instruction.
E.g. vadd.vv (use a5), after Phase 5, ====> vadd.vv (use const_int 0). Since "a5" is used in "vsetvl" instructions and
after the correct "vsetvl" instructions are inserted, each RVV instruction doesn't need AVL operand "a5" anymore. Then,
we remove this operand dependency helps for the following scheduling PASS.
Phase 6 is propagate_avl do the following 2 things:
1. Local && Global user vsetvl instructions optimization.
E.g.
vsetvli a2, a2, e8, mf8 ======> Change it into vsetvli a2, a2, e32, mf2
vsetvli zero,a2, e32, mf2 ======> eliminate
2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is not used by any instructions.
Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM which change the CFG, I re-new a new
RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and optmize user vsetvli base on the new RTL_SSA.
There are 2 issues in Phase 5 && Phase 6:
1. local_eliminate_vsetvl_insn was introduced by @kito which can do better local user vsetvl optimizations better than
Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So the local user vsetvli instructions optimizaiton
in Phase 6 is redundant and should be removed.
2. A bug discovered by my downstream auto-vectorization test-generator (I can't put the test in this patch since we are missing autovec
patterns for it so we can't use the upstream GCC directly reproduce such issue but I will remember put it back after I support the
necessary autovec patterns). Such bug is causing by using RTL_SSA re-new framework. The issue description is this:
Before Phase 6:
...
insn1: vsetlvi a3, 17 <========== generated by SELECT_VL auto-vec pattern.
slli a4,a3,3
...
insn2: vsetvli zero, a3, ...
load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is removed in Phase 5)
...
In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
insn2 is the vsetvli instruction inserted in Phase 4 which is not included in the RLT_SSA framework
even though we renew it (I didn't take a look at it and I don't think we need to now).
Base on this situation, the def_info of insn2 has the information "set->single_nondebug_insn_use ()"
which return true. Obviously, this information is not correct, since insn1 has aleast 2 uses:
1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my downstream test-generator
execution test failed.
Conclusion of RTL_SSA framework:
Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of the VSETVL PASS which is absolutely correct, the other
is re-new after Phase 4 (LCM) has incorrect information that causes bugs.
Besides, we don't like to initialize RTL_SSA second time it seems to be a waste since we just need to do a little optimization.
Base on all circumstances I described above, I rework and reorganize Phase 5 && Phase 6 as follows:
1. Phase 5 is called ssa_post_optimization which is doing the optimization base on the RTL_SSA information (The RTL_SSA is initialized
at the beginning of the VSETVL PASS, no need to re-new it again). This phase includes 3 optimizaitons:
1). local_eliminate_vsetvl_insn we already have (no change).
2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal Phase 6 but with more powerful and reliable implementation.
E.g.
void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
size_t avl;
if (m > 100)
avl = __riscv_vsetvl_e16mf4(vl << 4);
else
avl = __riscv_vsetvl_e32mf2(vl >> 8);
for (size_t i = 0; i < m; i++) {
vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
__riscv_vse8_v_i8mf8(out + i, v0, avl);
}
}
This example failed to global user vsetvl optimize before this patch:
f:
li a5,100
bleu a3,a5,.L2
slli a2,a2,4
vsetvli a4,a2,e16,mf4,ta,mu
.L3:
li a5,0
vsetvli zero,a4,e8,mf8,ta,ma
.L5:
add a6,a0,a5
add a2,a1,a5
vle8.v v1,0(a6)
addi a5,a5,1
vadd.vv v1,v1,v1
vse8.v v1,0(a2)
bgtu a3,a5,.L5
.L10:
ret
.L2:
beq a3,zero,.L10
srli a2,a2,8
vsetvli a4,a2,e32,mf2,ta,mu
j .L3
With this patch:
f:
li a5,100
bleu a3,a5,.L2
slli a2,a2,4
vsetvli zero,a2,e8,mf8,ta,ma
.L3:
li a5,0
.L5:
add a6,a0,a5
add a2,a1,a5
vle8.v v1,0(a6)
addi a5,a5,1
vadd.vv v1,v1,v1
vse8.v v1,0(a2)
bgtu a3,a5,.L5
.L10:
ret
.L2:
beq a3,zero,.L10
srli a2,a2,8
vsetvli zero,a2,e8,mf8,ta,ma
j .L3
3). Remove AVL operand dependency of each RVV instructions.
2. Phase 6 is called df_post_optimization: Optimize "vsetvl a3,a2...." into Optimize "vsetvl zero,a2...." base on
dataflow analysis of new CFG (new CFG is created by LCM). The reason we need to do use new CFG and after Phase 5:
...
vsetvl a3, a2...
vadd.vv (use a3)
If we don't have Phase 5 which removes the "a3" use in vadd.vv, we will fail to optimize vsetvl a3,a2 into vsetvl zero,a2.
This patch passed all tests in rvv.exp with ONLY peformance && codegen improved (no performance decline and no bugs including my
downstream tests).
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (available_occurrence_p): Enhance user vsetvl optimization.
(vector_insn_info::parse_insn): Add rtx_insn parse.
(pass_vsetvl::local_eliminate_vsetvl_insn): Enhance user vsetvl optimization.
(get_first_vsetvl): New function.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_insns): Remove it.
(pass_vsetvl::ssa_post_optimization): New function.
(has_no_uses): Ditto.
(pass_vsetvl::propagate_avl): Remove it.
(pass_vsetvl::df_post_optimization): New function.
(pass_vsetvl::lazy_vsetvl): Rework Phase 5 && Phase 6.
* config/riscv/riscv-vsetvl.h: Adapt declaration.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vsetvl-16.c: Adapt test.
* gcc.target/riscv/rvv/vsetvl/vsetvl-2.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-3.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-21.c: New test.
* gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: New test.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: New test.
Aldy Hernandez [Wed, 17 May 2023 09:29:34 +0000 (11:29 +0200)]
Convert ipcp_vr_lattice to type agnostic framework.
This converts the lattice to store ranges in Value_Range instead of
value_range (*) to make it type agnostic, and adjust all users
accordingly.
I've been careful to make sure Value_Range never ends up on GC, since
it contains an int_range_max and can expand on-demand onto the heap.
Longer term storage for ranges should be done with vrange_storage, as
per the previous patch ("Provide an API for ipa_vr").
gcc/ChangeLog:
* ipa-cp.cc (ipcp_vr_lattice::init): Take type argument.
(ipcp_vr_lattice::print): Call dump method.
(ipcp_vr_lattice::meet_with): Adjust for m_vr being a
Value_Range.
(ipcp_vr_lattice::meet_with_1): Make argument a reference.
(ipcp_vr_lattice::set_to_bottom): Set varying for an unsupported
range.
(initialize_node_lattices): Pass type when appropriate.
(ipa_vr_operation_and_type_effects): Make type agnostic.
(ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
(evaluate_properties_for_edge): Same.
* ipa-prop.cc (ipa_vr::get_vrange): Same.
(ipcp_update_vr): Same.
* ipa-prop.h (ipa_value_range_from_jfunc): Same.
(ipa_range_set_and_normalize): Same.
testsuite: Cut down 27_io/basic_istream/.../94749.cc for simulators
The test wchar_t/94749.cc can take about 10 minutes on some
simulator/host combinations with char/94749.cc at a third of
that time. The cause is test05 which is quite heavy and
includes wrapping a 32-bit counter. Run it only for native
setups.
Georg-Johann Lay [Sat, 10 Jun 2023 19:47:53 +0000 (21:47 +0200)]
target/109650: Fix wrong code after cc0 -> CCmode transition.
This patch fixes a wrong-code bug in the wake of PR92729, the transition that
turned the AVR backend from cc0 to CCmode. In cc0, the insn that uses cc0 like
a conditional branch always follows the cc0 setter, which is no more the case
with CCmode where set and use of REG_CC might be in different basic blocks.
This patch removes the machine-dependent reorg pass in avr_reorg entirely.
It is replaced by a new, AVR specific mini-pass that runs prior to split2.
Canonicalization of comparisons away from the "difficult" codes GT[U] and LE[U]
is now mostly performed by implementing TARGET_CANONICALIZE_COMPARISON.
Moreover:
* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as needed.
* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as needed.
* Conditional branches no more clobber REG_CC.
* insn output for compares looks ahead to determine the branch mode in use.
This needs also "dead_or_set_regno_p (*, REG_CC)".
* Add RTL peepholes for decrement-and-branch detection.
* Some of the patterns like "*cmphi.zero-extend.0" lost their
combine-ational part wit PR92729. Restore them.
Finally, it fixes some of the many indentation glitches left over from PR92729.
gcc/
PR target/109650
PR target/92729
* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN)
(avr_out_cmp_ext): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
[ADJUST_LEN_CMP_SEXT]Handle them.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.
* config/avr/avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
(avr_out_cmp_zext): New Protos
* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
(*swapped_tst<mode>, *add.for.eqne.<mode>): New insns.
(*cbranch<mode>4): Rename to cbranch<mode>4_insn.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst<mode>.
Rework signtest-and-branch peepholes for *sbrx_branch<mode>.
(adjust_len) [add_set_ZN, cmp_zext]: New.
(QIPSI): New mode iterator.
(ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
(gelt): New code iterator.
(gelt_eqne): New code attribute.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst<mode>, *reversed_tst<mode>)
(*cmpqi_sign_extend): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.
* config/avr/avr-dimode.md (cbranch<mode>4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/constraints.md (Yxx): New constraint.
gcc/testsuite/
PR target/109650
* gcc.target/avr/torture/pr109650-1.c: New test.
* gcc.target/avr/torture/pr109650-2.c: New test.
* ieee/ieee_arithmetic.F90: Add IEEE_MIN_NUM, IEEE_MAX_NUM,
IEEE_MIN_NUM_MAG, and IEEE_MAX_NUM_MAG functions.
gcc/fortran/
* f95-lang.cc (gfc_init_builtin_functions): Add fmax() and
fmin() built-ins, and their variants.
* mathbuiltins.def: Add FMAX and FMIN built-ins.
* trans-intrinsic.cc (conv_intrinsic_ieee_minmax): New function.
(gfc_conv_ieee_arithmetic_function): Handle IEEE_MIN_NUM and
IEEE_MAX_NUM functions.
gcc/testsuite/
* gfortran.dg/ieee/minmax_1.f90: New test.
Xi Ruoyao [Fri, 2 Jun 2023 18:25:44 +0000 (02:25 +0800)]
libatomic: x86_64: Always try ifunc
We used to skip ifunc check when CX16 is available. But now we use
CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16
alone is not a sufficient reason not to use ifunc (see PR104688).
This causes a subtle and annoying issue: when GCC is built with a
higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and
the worst (locked) implementation of __atomic_load_16 is always used.
There seems no good way to check if the CPU is Intel or AMD from
the built-in macros (maybe we can check every known model like __skylake,
__bdver2, ..., but it will be very error-prune and require an update
whenever we add the support for a new x86 model). The best thing we can
do seems "always try ifunc" here.
libatomic/ChangeLog:
* configure.tgt: For x86_64, always set try_ifunc=yes.
Tim Lange [Fri, 9 Jun 2023 18:07:33 +0000 (20:07 +0200)]
analyzer: Fix allocation size false positive on conjured svalue [PR109577]
Currently, the analyzer tries to prove that the allocation size is a
multiple of the pointee's type size. This patch reverses the behavior
to try to prove that the expression is not a multiple of the pointee's
type size. With this change, each unhandled case should be gracefully
considered as correct. This fixes the bug reported in PR 109577 by
Paul Eggert.
Regression-tested on Linux x86-64 with -m32 and -m64.
2023-06-09 Tim Lange <mail@tim-lange.me>
PR analyzer/109577
gcc/analyzer/ChangeLog:
* constraint-manager.cc (class sval_finder): Visitor to find
childs in svalue trees.
(constraint_manager::sval_constrained_p): Add new function to
check whether a sval might be part of an constraint.
* constraint-manager.h: Add sval_constrained_p function.
* region-model.cc (class size_visitor): Reverse behavior to not
emit a warning on not explicitly considered cases.
(region_model::check_region_size):
Adapt to size_visitor changes.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/allocation-size-2.c: Change expected output
and add new test case.
* gcc.dg/analyzer/pr109577.c: New test.
Pan Li [Fri, 9 Jun 2023 15:28:47 +0000 (23:28 +0800)]
RISC-V: Add test cases for RVV FP16 vreinterpret
This patch would like to add more tests for RVV FP16 vreinterpret, aka
vfloat16*_t <==> v{u}int16*_t.
There we allow FP16 vreinterpret in ZVFHMIN consider we have vle FP16 already.
It doesn't break anything in SPEC as there is no such vreinterpret insn.
From the user's perspective, it is reasonable to do some type convert
between vfloat16 and v{u}int16 when only ZVFHMIN is enabled.
This patch would like to add new test cases to make sure the RVV FP16
vreinterpret works well as expected.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Diito.
Juzhe-Zhong [Sat, 10 Jun 2023 00:37:37 +0000 (08:37 +0800)]
RISC-V: Enable select_vl for RVV auto-vectorization
Consider this following example:
void vec_add(int32_t *restrict c, int32_t *restrict a, int32_t *restrict b,
int N) {
for (long i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
}
After this patch:
vec_add:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a2)
vsetvli a6,zero,e32,m1,ta,ma ===> redundant vsetvl.
slli a4,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma ===> redundant vsetvl.
vse32.v v1,0(a0)
add a1,a1,a4
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
We can get close-to-optimal codegen but with some redundant vsetvls.
This is not the big issue which will be easily addressed in RISC-V backend.
I am going to add a standalone PASS "AVL propagation" (avlprop) to addresse
such issue.
gcc/ChangeLog:
* config/riscv/autovec.md (select_vl<mode>): New pattern.
* config/riscv/riscv-protos.h (expand_select_vl): New function.
* config/riscv/riscv-v.cc (expand_select_vl): Ditto.
Andrew MacLeod [Fri, 9 Jun 2023 16:58:57 +0000 (12:58 -0400)]
Provide a unified range-op table.
Create a table to prepare for unifying all operations into a single table.
Move any operators which only occur in one table to the approriate
initialization routine.
Provide a mixed header file for range-ops with multiple categories.
* range-op-float.cc (class float_table): Move to header.
(float_table::float_table): Move float only operators to...
(range_op_table::initialize_float_ops): Here.
* range-op-mixed.h: New.
* range-op.cc (integral_tree_table, pointer_tree_table): Moved
to top of file.
(float_tree_table): Moved from range-op-float.cc.
(unified_tree_table): New.
(unified_table::unified_table): New. Call initialize routines.
(get_op_handler): Check unified table first.
(range_op_handler::range_op_handler): Handle no type constructor.
(integral_table::integral_table): Move integral only operators to...
(range_op_table::initialize_integral_ops): Here.
(pointer_table::pointer_table): Move pointer only operators to...
(range_op_table::initialize_pointer_ops): Here.
* range-op.h (enum bool_range_state): Move to range-op-mixed.h.
(get_bool_state): Ditto.
(empty_range_varying): Ditto.
(relop_early_resolve): Ditto.
(class range_op_table): Add new init methods for range types.
(class integral_table): Move declaration to here.
(class pointer_table): Move declaration to here.
(class float_table): Move declaration to here.
Ju-Zhe Zhong [Fri, 9 Jun 2023 08:39:34 +0000 (16:39 +0800)]
VECT: Add SELECT_VL support
This patch address comments from Richard && Richi and rebase to trunk.
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
David Malcolm [Fri, 9 Jun 2023 21:58:33 +0000 (17:58 -0400)]
analyzer: add caching to globals with initializers [PR110112]
PR analyzer/110112 notes that -fanalyzer is extremely slow on a source
file with large read-only static arrays, repeatedly building the
same compound_svalue representing the full initializer, and repeatedly
building svalues representing parts of the the full initialiazer.
This patch adds caches for both of these; together they reduce the time
taken by -fanalyzer -O2 on the testcase in the bug for an optimized
build:
91.2s : no caches (status quo)
32.4s : cache in decl_region::get_svalue_for_constructor
3.7s : cache in region::get_initial_value_at_main
3.1s : both caches (this patch)
gcc/analyzer/ChangeLog:
PR analyzer/110112
* region-model.cc (region_model::get_initial_value_for_global):
Move code to region::calc_initial_value_at_main.
* region.cc (region::get_initial_value_at_main): New function.
(region::calc_initial_value_at_main): New function, based on code
in region_model::get_initial_value_for_global.
(region::region): Initialize m_cached_init_sval_at_main.
(decl_region::get_svalue_for_constructor): Add a cache, splitting
out body to...
(decl_region::calc_svalue_for_constructor): ...this new function.
* region.h (region::get_initial_value_at_main): New decl.
(region::calc_initial_value_at_main): New decl.
(region::m_cached_init_sval_at_main): New field.
(decl_region::decl_region): Initialize m_ctor_svalue.
(decl_region::calc_svalue_for_constructor): New decl.
(decl_region::m_ctor_svalue): New field.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Ken Matsui [Tue, 23 May 2023 19:41:14 +0000 (12:41 -0700)]
libstdc++: use using instead of typedef for type_traits
Since the type_traits header is a C++11 header file, using can be used instead
of typedef. This patch provides more readability, especially for long type
names.
libstdc++-v3/ChangeLog:
* include/std/type_traits: Use using instead of typedef
Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Andrew MacLeod [Wed, 31 May 2023 21:02:00 +0000 (17:02 -0400)]
Relocate range_cast to header, and add a generic version.
Make range_cast inlinable by moving it to the header file.
Also trap if the destination is not capable of representing the cast type.
Add a generic version which can change range classes.. ie float to int.
* range-op.cc (range_cast): Move to...
* range-op.h (range_cast): Here and add generic a version.
Jason Merrill [Fri, 9 Jun 2023 14:37:35 +0000 (10:37 -0400)]
c++: fix 32-bit spaceship failures [PR110185]
Various spaceship tests failed after r14-1624. This turned out to be
because the comparison category classes return in memory on 32-bit targets,
and the synthesized operator<=> looks something like
if (auto v = a.x <=> b.x, v == 0); else return v;
if (auto v = a.y <=> b.y, v == 0); else return v;
etc.
so check_return_expr was trying to do NRVO for all the 'v' variables, and
now on subsequent returns we check to see if the previous NRV is still in
scope. But the NRVs didn't have names, so looking up name bindings crashed.
Fixed both by giving 'v' a name so we can NRVO the first one, and fixing the
test to give up if the old NRV has no name.
PR c++/110185
PR c++/58487
gcc/cp/ChangeLog:
* method.cc (build_comparison_op): Give retval a name.
* typeck.cc (check_return_expr): Fix for nameless variables.
Jason Merrill [Thu, 8 Jun 2023 23:31:18 +0000 (19:31 -0400)]
c++: diagnose auto in template arg
We were failing to diagnose this Concepts TS feature that didn't make it
into C++20 because the 'auto' was getting converted to a template parameter
before we checked for it. So also check in cp_parser_simple_type_specifier.
The code in cp_parser_template_type_arg that I initially expected to
diagnose this seems unreachable because cp_parser_type_id_1 already checks
auto.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_simple_type_specifier): Check for auto
in template argument.
(cp_parser_template_type_arg): Remove auto checking.
gcc/testsuite/ChangeLog:
* g++.dg/concepts/auto7.C: New test.
* g++.dg/concepts/auto7a.C: New test.
Marek Polacek [Thu, 8 Jun 2023 17:52:11 +0000 (13:52 -0400)]
doc: Clarification for -Wmissing-field-initializers
The manual is incorrect in saying that the option does not warn
about designated initializers, which it does in C++. Whether the
divergence in behavior is desirable is another thing, but let's
at least make the manual match the reality.
PR c/39589
PR c++/96868
gcc/ChangeLog:
* doc/invoke.texi: Clarify that -Wmissing-field-initializers doesn't
warn about designated initializers in C only.
Andrew Pinski [Wed, 7 Jun 2023 16:05:15 +0000 (09:05 -0700)]
Add Plus to the op list of `(zero_one == 0) ? y : z <op> y` pattern
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Andrew Pinski [Wed, 7 Jun 2023 14:43:50 +0000 (07:43 -0700)]
Change the `(zero_one ==/!= 0) ? y : z <op> y` patterns to use multiply rather than `(-zero_one) & z`
Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.
gcc/ChangeLog:
* match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use
multiply rather than negation/bit_and.
Andrew Pinski [Thu, 8 Jun 2023 21:25:51 +0000 (14:25 -0700)]
MATCH: Fix zero_one_valued_p not to match signed 1 bit integers
So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note the GCC 13 patch will be slightly different due to the changes
made to zero_one_valued_p.
Lehua Ding [Fri, 9 Jun 2023 13:27:01 +0000 (07:27 -0600)]
testsuite: fix the condition bug in tsvc s176
This patch fixes the problem that the loop in the tsvc s176 function is
optimized and removed because `iterations/LEN_1D` is 0 (where iterations
is set to 10000, LEN_1D is set to 32000 in tsvc.h).
Jonathan Wakely [Fri, 9 Jun 2023 10:08:03 +0000 (11:08 +0100)]
libstdc++: Remove duplicate definition of _Float128 std::from_chars [PR110077]
When long double uses IEEE binary128 representation we define the
_Float128 overload of std::from_chars inline in <charconv>. My changes
in r14-1431-g7037e7b6e4ac41 cause it to also be defined non-inline in
the library, leading to an abi-check failure for (at least) sparc and
aarch64.
Suppress the definition in the library if long double and _Float128 have
are both IEEE binary128.
libstdc++-v3/ChangeLog:
PR libstdc++/110077
* src/c++17/floating_from_chars.cc (from_chars) <_Float128>:
Only define if _Float128 and long double have different
representations.
Jonathan Wakely [Fri, 9 Jun 2023 11:15:21 +0000 (12:15 +0100)]
libstdc++: Add preprocessor checks to <experimental/internet> [PR100285]
We can't define endpoints and resolvers without the relevant OS support.
If IPPROTO_TCP and IPPROTO_UDP are both udnefined then we won't need
basic_endpoint and basic_resovler anyway, so make them depend on those
macros.
libstdc++-v3/ChangeLog:
PR libstdc++/100285
* include/experimental/internet [IPPROTO_TCP || IPPROTO_UDP]
(basic_endpoint, basic_resolver_entry, resolver_base)
(basic_resolver_results, basic_resolver): Only define if the tcp
or udp protocols will be defined.
I had intended to support the P2510R3 proposal unconditionally in C++20
mode, but I left it half implemented. The parse function supported the
new extensions, but the format function didn't.
This adds the missing pieces, and makes it only enabled for C++26 and
non-strict modes.
libstdc++-v3/ChangeLog:
PR libstdc++/110149
* include/std/format (formatter<const void*, charT>::parse):
Only alow 0 and P for C++26 and non-strict modes.
(formatter<const void*, charT>::format): Use toupper for P
type, and insert zero-fill characters for 0 option.
* testsuite/std/format/functions/format.cc: Check pointer
formatting. Only check P2510R3 extensions conditionally.
* testsuite/std/format/parse_ctx.cc: Only check P2510R3
extensions conditionally.
Jonathan Wakely [Thu, 8 Jun 2023 11:24:43 +0000 (12:24 +0100)]
libstdc++: Optimize std::to_array for trivial types [PR110167]
As reported in PR libstdc++/110167, std::to_array compiles extremely
slowly for very large arrays. It needs to instantiate a very large
specialization of std::index_sequence<N...> and then create a very large
aggregate initializer from the pack expansion. For trivial types we can
simply default-initialize the std::array and then use memcpy to copy the
values. For non-trivial types we need to use the existing
implementation, despite the compilation cost.
As also noted in the PR, using a generic lambda instead of the
__to_array helper compiles faster since gcc-13. It also produces
slightly smaller code at -O1, due to additional inlining. The code at
-Os, -O2 and -O3 seems to be the same. This new implementation requires
__cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).
libstdc++-v3/ChangeLog:
PR libstdc++/110167
* include/std/array (to_array): Initialize arrays of trivial
types using memcpy. For non-trivial types, use lambda
expressions instead of a separate helper function.
(__to_array): Remove.
* testsuite/23_containers/array/creation/110167.cc: New test.
Richard Biener [Fri, 9 Jun 2023 07:29:09 +0000 (09:29 +0200)]
middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
Jonathan Wakely [Thu, 8 Jun 2023 11:19:26 +0000 (12:19 +0100)]
libstdc++: Improve tests for emplace member of sequence containers
Our existing tests for std::deque::emplace, std::list::emplace and
std::vector::emplace are poor. We only have compile tests for PR 52799
and the equivalent for a const_iterator as the insertion point. This
fails to check that the value is actually inserted correctly and the
right iterator is returned.
Add new tests that cover the existing 52799.cc and const_iterator.cc
compile-only tests, as well as verifying the effects are correct.
libstdc++-v3/ChangeLog:
* testsuite/23_containers/deque/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/list/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/vector/modifiers/emplace/1.cc: New
test.
Pan Li [Fri, 9 Jun 2023 03:19:12 +0000 (11:19 +0800)]
RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one define attr as
the gate for FP16 supported or not.
Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Co-Authored by: Kito Cheng <kito.cheng@sifive.com>
gcc/ChangeLog:
* config/riscv/riscv.md (enabled): Move to another place, and
add fp_vector_disabled to the cond.
(fp_vector_disabled): New attr defined for disabling fp.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
for ZVFHMIN.
Jakub Jelinek [Fri, 9 Jun 2023 07:10:29 +0000 (09:10 +0200)]
fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]
The pr96024.f90 testcase ICEs on big-endian hosts. The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x100000000 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.
2023-06-09 Jakub Jelinek <jakub@redhat.com>
PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.
liuhongt [Mon, 5 Jun 2023 04:38:41 +0000 (12:38 +0800)]
Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.
Since mask < 0 will be always false for vector char when
-funsigned-char, but vpblendvb needs to check the most significant
bit. The patch explicitly VCE to vector signed char.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly
view_convert_expr mask to signed type when folding pblendvb
builtins.
liuhongt [Mon, 5 Jun 2023 03:59:33 +0000 (11:59 +0800)]
Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE.
r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
TYPE_MIN, but PABSB will store unsigned result into dst. The patch
uses ABSU_EXPR + VCE instead of ABS_EXPR.
Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit
vector absm2 is guarded with TARGET_MMX_WITH_SSE.
gcc/ChangeLog:
PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple
ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o
TARGET_64BIT.
* config/i386/i386-builtin.def: Replace CODE_FOR_nothing with
real codename for __builtin_ia32_pabs{b,w,d}.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110108.c: New test.
* gcc.target/i386/pr110108-3.c: New test.
* gcc.target/i386/pr109900.c: Adjust testcase.
Gaius Mulley [Thu, 8 Jun 2023 23:55:50 +0000 (00:55 +0100)]
PR modula2/110126 variables are reported as unused when referenced by ASM
This patches fixes two problems with the asm statement.
gm2 -Wall -c fooasm3.mod generates an incorrect warning and
gm2 cannot concatenate strings before an ASM statement.
The asm statement now accepts a constant expression (rather than
a string) and it updates the variable read/write use lists as
appropriate.
gcc/m2/ChangeLog:
PR modula2/110126
* gm2-compiler/M2GenGCC.mod (BuildTreeFromInterface): Remove
tokenno parameter. Use object tok instead of tokenno.
(BuildTrashTreeFromInterface): Use object tok instead of
GetDeclaredMod.
(CodeInline): Remove tokenno from parameter list to BuildTreeFromInterface.
* gm2-compiler/M2Quads.def (BuildAsmElement): Exported and
defined.
* gm2-compiler/M2Quads.mod (BuildOptimizeOff): Reformatted.
(BuildInline): Reformatted.
(BuildLineNo): Reformatted.
(UseLineNote): Reformatted.
(BuildAsmElement): New procedure.
* gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P1Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P2Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P3Build.bnf (AsmOperands): Rewrite.
(AsmOperandSpec): Rewrite.
(AsmOutputList): New rule.
(AsmInputList): New rule.
(TrashList): Rewrite.
* gm2-compiler/PCBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/PHBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/SymbolTable.def (PutRegInterface):
Rewrite interface.
(GetRegInterface): Rewrite interface.
* gm2-compiler/SymbolTable.mod (SetFirstUsed): New procedure.
(PutFirstUsed): New procedure.
(PutRegInterface): Rewrite.
(GetRegInterface): Rewrite.
gcc/testsuite/ChangeLog:
PR modula2/110126
* gm2/pim/pass/fooasm3.mod: New test.
Andrew MacLeod [Wed, 31 May 2023 17:10:31 +0000 (13:10 -0400)]
Provide a new dispatch mechanism for range-ops.
Simplify range_op_handler to have a single range_operator pointer and
provide a more flexible dispatch mechanism for calls via generic vrange
classes. This is more extensible for adding new classes of range support.
Any unsupported dispatch patterns will simply return FALSE now rather
than generating compile time exceptions, aleviating the need to
constantly check for supoprted types.
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Adjust.
(gimple_range_op_handler::maybe_builtin_call): Adjust.
* gimple-range-op.h (operand1, operand2): Use m_operator.
* range-op.cc (integral_table, pointer_table): Relocate.
(get_op_handler): Rename from get_handler and handle all types.
(range_op_handler::range_op_handler): Relocate.
(range_op_handler::set_op_handler): Relocate and adjust.
(range_op_handler::range_op_handler): Relocate.
(dispatch_trio): New.
(RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts.
(range_op_handler::dispatch_kind): New.
(range_op_handler::fold_range): Relocate and Use new dispatch value.
(range_op_handler::op1_range): Ditto.
(range_op_handler::op2_range): Ditto.
(range_op_handler::lhs_op1_relation): Ditto.
(range_op_handler::lhs_op2_relation): Ditto.
(range_op_handler::op1_op2_relation): Ditto.
(range_op_handler::set_op_handler): Use m_operator member.
* range-op.h (range_op_handler::operator bool): Use m_operator.
(range_op_handler::dispatch_kind): New.
(range_op_handler::m_valid): Delete.
(range_op_handler::m_int): Delete
(range_op_handler::m_float): Delete
(range_op_handler::m_operator): New.
(range_op_table::operator[]): Relocate from .cc file.
(range_op_table::set): Ditto.
* value-range.h (class vrange): Make range_op_handler a friend.
Andrew MacLeod [Wed, 31 May 2023 16:31:53 +0000 (12:31 -0400)]
Unify range_operators to one class.
Range_operator and range_operator_float are 2 different classes, making
generalized dispatch difficult. The distinction between what is a float
operator and what is an integral operator also blurs when some methods
have multiple types. ie, casts : INT = FLOAT and FLOAT = INT
This patch unifies all possible invocation patterns in one class, and
switches the float table to use the general range_op_table.
* gimple-range-op.cc (cfn_constant_float_p): Change base class.
(cfn_pass_through_arg1): Adjust using statemenmt.
(cfn_signbit): Change base class, adjust using statement.
(cfn_copysign): Ditto.
(cfn_sqrt): Ditto.
(cfn_sincos): Ditto.
* range-op-float.cc (fold_range): Change class to range_operator.
(rv_fold): Ditto.
(op1_range): Ditto
(op2_range): Ditto
(lhs_op1_relation): Ditto.
(lhs_op2_relation): Ditto.
(op1_op2_relation): Ditto.
(foperator_*): Ditto.
(class float_table): New. Inherit from range_op_table.
(floating_tree_table) Change to range_op_table pointer.
(class floating_op_table): Delete.
* range-op.cc (operator_equal): Adjust using statement.
(operator_not_equal): Ditto.
(operator_lt, operator_le, operator_gt, operator_ge): Ditto.
(operator_minus, operator_cast): Ditto.
(operator_bitwise_and, pointer_plus_operator): Ditto.
(get_float_handle): Change return type.
* range-op.h (range_operator_float): Delete. Relocate all methods
into class range_operator.
(range_op_handler::m_float): Change type to range_operator.
(floating_op_table): Delete.
(floating_tree_table): Change type.
Andrew MacLeod [Wed, 31 May 2023 14:55:28 +0000 (10:55 -0400)]
Remove tree_code from range-operator.
Range_operator had a tree code added last release to facilitate
bitmask operations. This removes the tree_code and replaces it with a
virtual routine to peform the masking. Remove any duplicate instances
which are no longer needed.
Andrew MacLeod [Wed, 7 Jun 2023 18:03:35 +0000 (14:03 -0400)]
Fix floating point bug in fold_range.
We currently do not have any floating point operators where operand 1 is
a different type than the LHS. When we eventually do there is a bug
in fold_range. If either operand is a known NAN, it returns a NAN
of the type of operand 1 instead of the result type.
* range-op-float.cc (range_operator_float::fold_range): Return
NAN of the result type.
This patch enhances -Wanalyzer-out-of-bounds that is no longer paired
with a -Wanalyzer-use-of-uninitialized-value on out-of-bounds-read.
This also fixes PR analyzer/109437.
Before there could always be at most one OOB-read warning per frame because
-Wanalyzer-use-of-uninitialized-value always terminates the analysis
path.
PR 109439
gcc/analyzer/ChangeLog:
* bounds-checking.cc (region_model::check_symbolic_bounds): Returns whether the BASE_REG
region access was OOB.
(region_model::check_region_bounds): Likewise.
* region-model.cc (region_model::get_store_value): Creates an
unknown svalue on OOB-read access to REG.
(region_model::check_region_access): Returns whether an unknown svalue needs be created.
(region_model::check_region_for_read): Passes check_region_access return value.
* region-model.h: Update prior function definitions.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-2.c: Cleaned test for uninitialized-value warning
* gcc.dg/analyzer/out-of-bounds-5.c: Likewise.
* gcc.dg/analyzer/pr101962.c: Likewise.
* gcc.dg/analyzer/realloc-5.c: Likewise.
* gcc.dg/analyzer/pr109439.c: New test.
Jakub Jelinek [Thu, 8 Jun 2023 08:13:23 +0000 (10:13 +0200)]
optabs: Implement double-word ctz and ffs expansion
We have expand_doubleword_clz for a couple of years, where we emit
double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
else return CLZ (high_word);
We can do something similar for CTZ and FFS IMHO, just with the 2
words swapped. So if (low_word == 0) return CTZ (high_word) + word_size;
else return CTZ (low_word); for CTZ and
if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
else return FFS (low_word);
The following patch implements that.
Note, on some targets which implement both word_mode ctz and ffs patterns,
it might be better to incrementally implement those double-word ffs expansion
patterns in md files, because we aren't able to optimize it correctly;
nothing can detect we have just made sure that argument is not 0 and so
don't need to bother with handling that case. So, on ia32 just using
CTZ patterns would be better there, but I think we can even do better and
instead of doing the comparisons of the operands against 0 do the CTZ
expansion followed by testing of flags.
2023-06-08 Jakub Jelinek <jakub@redhat.com>
* optabs.cc (expand_ffs): Add forward declaration.
(expand_doubleword_clz): Rename to ...
(expand_doubleword_clz_ctz_ffs): ... this. Add UNOPTAB argument,
handle also doubleword CTZ and FFS in addition to CLZ.
(expand_unop): Adjust caller. Also call it for doubleword
ctz_optab and ffs_optab.
* gcc.target/i386/ctzll-1.c: New test.
* gcc.target/i386/ffsll-1.c: New test.
Jakub Jelinek [Thu, 8 Jun 2023 08:11:25 +0000 (10:11 +0200)]
i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]
I'm getting
+FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
+FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
regressions on i686-linux since r14-1166. The problem is when
ix86_expand_vector_init_general is called with mmx_ok = true and
mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
arguments (ok, fresh new tmp and vals) infinitely.
The following patch fixes that by passing mmx_ok to that recursive call.
For n_words == 4 it isn't needed, because we only care about mmx_ok for
V2SImode or V2SFmode and no other modes.
2023-06-08 Jakub Jelinek <jakub@redhat.com>
PR target/110152
* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
n_words == 2 recurse with mmx_ok as first argument rather than false.
Paul Thomas [Thu, 8 Jun 2023 06:11:32 +0000 (07:11 +0100)]
Fortran: Fix some more blockers in associate meta-bug [PR87477]
2023-06-08 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/87477
PR fortran/99350
PR fortran/107821
PR fortran/109451
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.
* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.
(resolve_variable): Associate names with constant or structure
constructor targets cannot have array refs.
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.
gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.
Roger Sayle [Wed, 7 Jun 2023 22:40:56 +0000 (23:40 +0100)]
[Committed] Bug fix to new wi::bitreverse_large function.
Richard Sandiford was, of course, right to be warry of new code without
much test coverage. Converting the nvptx backend to use the BITREVERSE
rtx infrastructure, has resulted in far more exhaustive testing and
revealed a subtle bug in the new wi::bitreverse implementation. The
code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended
sign extension.
This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(with a minor tweak to use BITREVERSE), where it fixes regressions of
the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit
test vectors in gcc.target/nvptx/brevll-2.c. Committed as obvious.
2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to
avoid sign extension/undefined behaviour when setting each bit.
Roger Sayle [Wed, 7 Jun 2023 22:35:15 +0000 (23:35 +0100)]
Add support for stc and cmc instructions in i386.md
This patch is the latest revision of my patch to add support for the
STC (set carry flag) and CMC (complement carry flag) instructions to
the i386 backend, incorporating Uros' previous feedback. The significant
changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern,
(iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate
implementations on pentium4 (which has a notoriously slow STC) when
not optimizing for size.
An example of the use of the stc instruction is:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}
with this patch now generates:
stc
adcl %esi, %edi
setc %al
movl %edi, (%rdx)
movzbl %al, %eax
ret
An example of the use of the cmc instruction (where the carry from
a first adc is inverted/complemented as input to a second adc) is:
unsigned int bar (unsigned int a, unsigned int b,
unsigned int c, unsigned int d)
{
unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
}
and now generates:
stc
adcl %esi, %edi
cmc
movl %edi, o1(%rip)
adcl %ecx, %edx
setc %al
movl %edx, o2(%rip)
movzbl %al, %eax
ret
This version implements Uros' suggestions/refinements. (i) Avoid the
UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use
peephole2s to convert x86_stc and *x86_cmc into alternate forms on
TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is
available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al)
over negqi_ccc_1 (neg %al) for setting the carry from a QImode value,
These changes required two minor edits to i386.cc: ix86_cc_mode had
to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and
*x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that
combine would appreciate that this complex RTL expression was actually
a fast, single byte instruction [i.e. preferable].
2022-06-07 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
Use new x86_stc instruction when the carry flag must be set.
* config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc.
(ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc.
* config/i386/i386.h (TARGET_SLOW_STC): New define.
* config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc.
(x86_stc): New define_insn.
(define_peephole2): Convert x86_stc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*x86_cmc): New define_insn.
(define_peephole2): Convert *x86_cmc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*setccc): New define_insn_and_split for a no-op CCCmode move.
(*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(*setcc_qi_negqi_ccc_2_<mode>): Likewise.
* config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.
gcc/testsuite/ChangeLog
* gcc.target/i386/cmc-1.c: New test case.
* gcc.target/i386/stc-1.c: Likewise.
Jason Merrill [Wed, 7 Jun 2023 09:15:02 +0000 (05:15 -0400)]
c++: allow NRV and non-NRV returns [PR58487]
Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later return
expression can't possibly alias it.
This fixes 58487 and half-fixes 53637: now one of the returns is elided, but
not the other.
Fixing the remaining xfails in these testcases will require a very different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.
PR c++/58487
PR c++/53637
gcc/cp/ChangeLog:
* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.
gcc/testsuite/ChangeLog:
* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.
Jeff Law [Wed, 7 Jun 2023 19:40:16 +0000 (13:40 -0600)]
RISC-V: Eliminate extension after for *w instructions
This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".
The main idea of it is to add SUBREG_PROMOTED fields during expanding.
I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.
gcc/ChangeLog:
* config/riscv/bitmanip.md (rotrdi3, rotrsi3, rotlsi3): New expanders.
(rotrsi3_sext): Expose generator.
(rotlsi3 pattern): Hide generator.
* config/riscv/riscv-protos.h (riscv_emit_binary): New function
declaration.
* config/riscv/riscv.cc (riscv_emit_binary): Removed static
* config/riscv/riscv.md (addsi3, subsi3, negsi2): Hide generator.
(mulsi3, <optab>si3): Likewise.
(addsi3, subsi3, negsi2, mulsi3, <optab>si3): New expanders.
(addv<mode>4, subv<mode>4, mulv<mode>4): Use riscv_emit_binary.
(<u>mulsidi3): Likewise.
(addsi3_extended, subsi3_extended, negsi2_extended): Expose generator.
(mulsi3_extended, <optab>si3_extended): Likewise.
(splitter for shadd feeding divison): Update RTL pattern to account
for changes in how 32 bit ops are expanded for TARGET_64BIT.
* loop-iv.cc (get_biv_step_1): Process src of extension when it PLUS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/shift-and-2.c: New tests.
* gcc.target/riscv/shift-shift-2.c: Adjust expected output.
* gcc.target/riscv/sign-extend.c: New test.
* gcc.target/riscv/zbb-rol-ror-03.c: Adjust expected output.
Fix by following the spirit of the adjacent comment, and using the
dedicated riscv_const_insns() function to calculate cost for loading a
constant element. Infinite recursion is not possible because the first
invocation is on a CONST_VECTOR, whereas the second is on a single
element of the vector (e.g. CONST_INT or CONST_DOUBLE).
Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Recursively call
for constant element of a vector.
Jakub Jelinek [Wed, 7 Jun 2023 17:27:35 +0000 (19:27 +0200)]
libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]
This test apparently contains 3 problematic floating point constants,
1e126, 4.91e-6 and 5.547e-6. These constants suffer from double rounding
when -fexcess-precision=standard evaluates double constants in the precision
of Intel extended 80-bit long double.
As written in the PR, e.g. the first one is
0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
in the precision of GCC's internal format, 80-bit long double has
63-bit precision, so the above constant rounded to long double is
0x1.7a2ecc414a03f800p+418L
(the least significant bit in the 0 before p isn't there already).
0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
0x1.7a2ecc414a040p+418.
Now, if excess precision doesn't happen and we round the GCC's internal
format number directly to double, it is
0x1.7a2ecc414a03fp+418 and that is the number the test expects.
One can see it on x86-64 (where excess precision to long double doesn't
happen) where double(1e126L) != 1e126.
The other two constants suffer from the same problem.
The following patch tweaks the testcase, such that those problematic
constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
guarantee the constants will be evaluated in double precision),
plus adds corresponding tests with hexadecimal constants which don't
suffer from this excess precision problem, they are exact in double
and long double can hold all double values.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/110145
* testsuite/20_util/to_chars/double.cc: Include <cfloat>.
(double_to_chars_test_cases,
double_scientific_precision_to_chars_test_cases_2,
double_fixed_precision_to_chars_test_cases_2): #if out 1e126, 4.91e-6
and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
Add unconditional tests with corresponding double constants
0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
0x1.7440bbff418b9p-18.