We fail to verify the constraints under which we allow handled
components to wrap registers. The gcc.dg/pr70022.c testcase shows
that we happily end up with
_2 = VIEW_CONVERT_EXPR<int[4]>(v_1(D))
as produced by SSA rewrite and update_address_taken. But the intent
was that we wrap registers with at most a single level of handled
components and specifically only allow __real, __imag, BIT_FIELD_REF
and VIEW_CONVERT_EXPR on them, but not ARRAY_REF or COMPONENT_REF.
Together with the improved gimple_load predicate taking advantage
of the above and ASAN this eventually ICEd.
The following fixes update_address_taken as to this constraint.
PR tree-optimization/109594
* tree-ssa.cc (non_rewritable_mem_ref_base): Constrain
what we rewrite to a register based on the above.
Jason Merrill [Thu, 16 Mar 2023 20:55:39 +0000 (16:55 -0400)]
c++: restore instantiate_decl assert
For PR61445 I removed this assert, but PR108242 demonstrated why it's still
useful; to avoid regressing the former testcase I check pattern_defined
in the assert.
This reverts r212524.
PR c++/61445
gcc/cp/ChangeLog:
* pt.cc (instantiate_decl): Assert !defer_ok for local
class members.
With this, execution time for e.g. __moddi3 go from 59 to 40 cycles in
the "fast" case or from 290 to 200 cycles in the "slow" case (when the
!TARGET_HAS_NO_HW_DIVIDE variant calls division and modulus functions
for 32-bit SImode), as exposed by gcc.c-torture/execute/arith-rand-ll.c
compiled for -march=v10.
Unfortunately, it just puts a performance improvement "dent" of 0.07%
in a arith-rand-ll.c-based performance test - where all loops are also
reduced to 1/10.
The size of every affected libgcc function is reduced to less than
half and they are all now leaf functions.
Jason Merrill [Wed, 22 Mar 2023 20:11:47 +0000 (16:11 -0400)]
c++: local class in nested generic lambda [PR109241]
The earlier fix for PR109241 avoided the crash by handling a type with no
TREE_BINFO. But we want to move toward doing the partial substitution of
classes in generic lambdas, so let's take a step in that direction.
PR c++/109241
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Do partially instantiate.
(tsubst_expr): Do call complete_type for partial instantiations.
Jason Merrill [Wed, 1 Feb 2023 22:00:48 +0000 (17:00 -0500)]
c++: unique friend shenanigans [PR69836]
Normally we re-instantiate a function declaration when we start to
instantiate the body in case of multiple declarations. In this wacky
testcase, this causes a problem because the type of the w_counter parameter
depends on its declaration not being in scope yet, so the name lookup only
finds the previous declaration. This isn't a problem for member functions,
since they aren't subject to argument-dependent lookup. So let's just skip
the regeneration for hidden friends.
Patrick Palka [Wed, 26 Apr 2023 19:43:26 +0000 (15:43 -0400)]
c++: micro-optimize most_specialized_partial_spec
This introduces an early exit test to most_specialized_partial_spec for
templates which have no partial specializations, saving some unnecessary
work during class template instantiation in the common case. In passing,
modernize the code a bit.
gcc/cp/ChangeLog:
* pt.cc (most_specialized_partial_spec): Exit early when
DECL_TEMPLATE_SPECIALIZATIONS is empty. Move local variable
declarations closer to their first use. Remove redundant
flag_concepts test. Remove redundant forward declaration.
Andrew MacLeod [Tue, 28 Mar 2023 15:35:26 +0000 (11:35 -0400)]
Create a lazy ssa_cache.
Sparsely used ssa caches can benefit from using a bitmap to
determine if a name already has an entry. Utilize it in the path query
and remove its private bitmap for tracking the same info.
Also use it in the "assume" query class.
PR tree-optimization/108697
* gimple-range-cache.cc (ssa_global_cache::clear_range): Do
not clear the vector on an out of range query.
(ssa_cache::dump): Use dump_range_query instead of get_range.
(ssa_cache::dump_range_query): New.
(ssa_lazy_cache::dump_range_query): New.
(ssa_lazy_cache::set_range): New.
* gimple-range-cache.h (ssa_cache::dump_range_query): New.
(class ssa_lazy_cache): New.
(ssa_lazy_cache::ssa_lazy_cache): New.
(ssa_lazy_cache::~ssa_lazy_cache): New.
(ssa_lazy_cache::get_range): New.
(ssa_lazy_cache::clear_range): New.
(ssa_lazy_cache::clear): New.
(ssa_lazy_cache::dump): New.
* gimple-range-path.cc (path_range_query::path_range_query): Do
not allocate a ssa_cache object nor has_cache bitmap.
(path_range_query::~path_range_query): Do not free objects.
(path_range_query::clear_cache): Remove.
(path_range_query::get_cache): Adjust.
(path_range_query::set_cache): Remove.
(path_range_query::dump): Don't call through a pointer.
(path_range_query::internal_range_of_expr): Set cache directly.
(path_range_query::reset_path): Clear cache directly.
(path_range_query::ssa_range_in_phi): Fold with globals only.
(path_range_query::compute_ranges_in_phis): Simply set range.
(path_range_query::compute_ranges_in_block): Call cache directly.
* gimple-range-path.h (class path_range_query): Replace bitmap
and cache pointer with lazy cache object.
* gimple-range.h (class assume_query): Use ssa_lazy_cache.
Andrew MacLeod [Tue, 28 Mar 2023 15:32:21 +0000 (11:32 -0400)]
Rename ssa_global_cache to ssa_cache and add has_range
This renames the ssa_global_cache to be ssa_cache. The original use was
to function as a global cache, but its uses have expanded. Remove all mention
of "global" from the class and methods. Also add a has_range method.
* gimple-range-cache.cc (ssa_cache::ssa_cache): Rename.
(ssa_cache::~ssa_cache): Rename.
(ssa_cache::has_range): New.
(ssa_cache::get_range): Rename.
(ssa_cache::set_range): Rename.
(ssa_cache::clear_range): Rename.
(ssa_cache::clear): Rename.
(ssa_cache::dump): Rename and use get_range.
(ranger_cache::get_global_range): Use get_range and set_range.
(ranger_cache::range_of_def): Use get_range.
* gimple-range-cache.h (class ssa_cache): Rename class and methods.
(class ranger_cache): Use ssa_cache.
* gimple-range-path.cc (path_range_query::path_range_query): Use
ssa_cache.
(path_range_query::get_cache): Use get_range.
(path_range_query::set_cache): Use set_range.
* gimple-range-path.h (class path_range_query): Use ssa_cache.
* gimple-range.cc (assume_query::assume_range_p): Use get_range.
(assume_query::range_of_expr): Use get_range.
(assume_query::assume_query): Use set_range.
(assume_query::calculate_op): Use get_range and set_range.
* gimple-range.h (class assume_query): Use ssa_cache.
Andrew MacLeod [Thu, 13 Apr 2023 18:47:47 +0000 (14:47 -0400)]
Add sbr_lazy_vector and adjust (e)vrp sparse cache
Add a sparse vector class for cache and use if by default.
Rename the evrp_* params to vrp_*, and add a param for small CFGS which use
just the original basic vector.
* gimple-range-cache.cc (sbr_vector::sbr_vector): Add parameter
and local to optionally zero memory.
(br_vector::grow): Only zero memory if flag is set.
(class sbr_lazy_vector): New.
(sbr_lazy_vector::sbr_lazy_vector): New.
(sbr_lazy_vector::set_bb_range): New.
(sbr_lazy_vector::get_bb_range): New.
(sbr_lazy_vector::bb_range_p): New.
(block_range_cache::set_bb_range): Check flags and Use sbr_lazy_vector.
* gimple-range-gori.cc (gori_map::calculate_gori): Use
param_vrp_switch_limit.
(gori_compute::gori_compute): Use param_vrp_switch_limit.
* params.opt (vrp_sparse_threshold): Rename from evrp_sparse_threshold.
(vrp_switch_limit): Rename from evrp_switch_limit.
(vrp_vector_threshold): New.
Andrew MacLeod [Tue, 25 Apr 2023 19:33:52 +0000 (15:33 -0400)]
Don't save ssa-name pointer in dependency cache.
If the direct dependence fields point directly to an ssa-name,
its possible that an optimization frees an ssa-name, and the value
pointed to may now be in the free list. Simply maintain the ssa
version number instead.
PR tree-optimization/109417
* gimple-range-gori.cc (range_def_chain::register_dependency):
Save the ssa version number, not the pointer.
(gori_compute::may_recompute_p): No need to check if a dependency
is in the free list.
* gimple-range-gori.h (class range_def_chain): Change ssa1 and ssa2
fields to be unsigned int instead of trees.
(ange_def_chain::depend1): Adjust.
(ange_def_chain::depend2): Adjust.
* gimple-range.h: Include "ssa.h" to inline ssa_name().
David Edelsohn [Sun, 23 Apr 2023 15:22:06 +0000 (11:22 -0400)]
aix: Default AIX 7.2 to POWER7 server and AIX 7.3 to POWER8 server.
AIX 7.2 minimum ISA is POWER7 and AIX 7.3 minimum ISA is POWER8.
This patch changes the aix72.h configuration to POWER7 with VSX enabled
by default (with the AIX VSX ABI limitations), matching LLVM on AIX,
and changes the aix73.h configuration to POWER8.
gcc/ChangeLog:
* config/rs6000/aix72.h (TARGET_DEFAULT): Use ISA_2_6_MASKS_SERVER.
* config/rs6000/aix73.h (TARGET_DEFAULT): Use ISA_2_7_MASKS_SERVER.
(PROCESSOR_DEFAULT): Use PROCESSOR_POWER8.
Patrick O'Neill [Tue, 18 Apr 2023 21:33:13 +0000 (14:33 -0700)]
RISCV: Inline subword atomic ops
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.
This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.
gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.
2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
PR target/104338
* config/riscv/riscv-protos.h: Add helper function stubs.
* config/riscv/riscv.cc: Add helper functions for subword masking.
* config/riscv/riscv.opt: Add command-line flag.
* config/riscv/sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* doc/invoke.texi: Add blurb regarding command-line flag.
libgcc/ChangeLog:
PR target/104338
* config/riscv/atomic.c: Add reference to duplicate logic.
gcc/testsuite/ChangeLog:
PR target/104338
* gcc.target/riscv/inline-atomics-1.c: New test.
* gcc.target/riscv/inline-atomics-2.c: New test.
* gcc.target/riscv/inline-atomics-3.c: New test.
* gcc.target/riscv/inline-atomics-4.c: New test.
* gcc.target/riscv/inline-atomics-5.c: New test.
* gcc.target/riscv/inline-atomics-6.c: New test.
* gcc.target/riscv/inline-atomics-7.c: New test.
* gcc.target/riscv/inline-atomics-8.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
aarch64: Reimplement RSHRN2 intrinsic patterns with standard RTL codes
Similar to the previous patch, we can reimplement the rshrn2 patterns using standard RTL codes
for shift, truncate and plus with the appropriate constants.
This allows us to get rid of UNSPEC_RSHRN entirely.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_rshrn2<mode>_insn_le):
Reimplement using standard RTL codes instead of unspec.
(aarch64_rshrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>): Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_RSHRN): Delete.
aarch64: Reimplement RSHRN intrinsic patterns with standard RTL codes
This patch reimplements the backend patterns for the rshrn intrinsics using standard RTL codes rather than UNSPECS.
We already represent shrn as truncate of a shift. rshrn can be represented as truncate (src + (1 << (shft - 1)) >> shft),
similar to how LLVM treats it.
I have a follow-up patch to do the same for the rshrn2 pattern, which will allow us to remove the UNSPEC_RSHRN entirely.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>_insn_le): Reimplement
with standard RTL codes instead of an UNSPEC.
(aarch64_rshrn<mode>_insn_be): Likewise.
(aarch64_rshrn<mode>): Adjust for the above.
* config/aarch64/predicates.md (aarch64_simd_rshrn_imm_vec): Define.
Before this patch:
li a5,0
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(a5) <- can propagate the const 0 to a5 here
vs1r.v v24,0(a0)
After this patch:
vsetvli zero,a1,e32,m1,ta,ma
vle32.v v24,0(zero)
vs1r.v v24,0(a0)
As above, this patch allow you to propagate the const 0 (aka zero
register) to the base register of the RVV Unit-Stride load in the
combine pass. This may benefit the underlying RVV auto-vectorization.
However, the indexed load failed to perform the optimization and it
will be take care of in another PATCH.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_classify_address): Allow
const0_rtx for the RVV load/store.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com> Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Aldy Hernandez [Mon, 21 Nov 2022 22:19:02 +0000 (23:19 +0100)]
Remove legacy range support.
This patch removes all the code paths guarded by legacy_mode_p(), thus
allowing us to re-use the int_range<1> idiom for a range of one
sub-range. This allows us to represent these simple ranges in a more
efficient manner.
Aldy Hernandez [Wed, 21 Dec 2022 18:26:00 +0000 (19:26 +0100)]
Fix swapping of ranges.
The legacy range code has logic to swap out of order endpoints in the
irange constructor. The new irange code expects the caller to fix any
inconsistencies, thus speeding up the common case. However, this means
that when we remove legacy, any stragglers must be fixed. This patch
fixes the 3 culprits found during the conversion.
gcc/ChangeLog:
* range-op.cc (operator_cast::op1_range): Use
create_possibly_reversed_range.
(operator_bitwise_and::simple_op1_range_solver): Same.
* value-range.cc (swap_out_of_order_endpoints): Delete.
(irange::set): Remove call to swap_out_of_order_endpoints.
Aldy Hernandez [Thu, 2 Mar 2023 14:43:20 +0000 (15:43 +0100)]
Convert users of legacy API to get_legacy_range() function.
This patch converts the users of the legacy API to a function called
get_legacy_range() which will return the pieces of the soon to be
removed API (min, max, and kind). This is a temporary measure while
these users are converted.
In upcoming patches I will convert most users, but most of the
middle-end warning uses will remain. Naive attempts to remove them
showed that a lot of these uses are quite dependant on the anti-range
idiom, and converting them to the new API broke the tests, even when
the conversion was conceptually correct. Perhaps someone who
understands these passes could take a stab at it. In the meantime,
the legacy uses can be trivially found by grepping for
get_legacy_range.
Roger Sayle [Wed, 26 Apr 2023 08:10:06 +0000 (09:10 +0100)]
[xstormy16] Add support for byte and word swapping instructions.
This patch adds support for xstormy16's swpb (swap bytes) and swpw (swap
words) instructions. The most obvious application of these to implement
the __builtin_bswap16 and __builtin_bswap32 intrinsics.
Currently, __builtin_bswap16 is implemented as:
foo: mov r7,r2
shl r7,#8
shr r2,#8
or r2,r7
ret
but with this patch becomes:
foo: swpb r2
ret
Likewise, __builtin_bswap32 now becomes:
foo: swpb r2 | swpb r3 | swpw r2,r3
ret
Finally, the swpw instruction on its own can be used to exchange
two word mode registers without a temporary, so a new pattern and
peephole2 have been added to catch this. As described in the
PR rtl-optimization/106518, register allocation can (in theory)
be more efficient on targets that provide a swap/exchange instruction.
The slightly unusual swap<mode> naming matches that used in i386.md.
2024-04-26 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/stormy16/stormy16.md (bswaphi2): New define_insn.
(bswapsi2): New define_insn.
(swaphi): New define_insn to exchange two registers (swpw).
(define_peephole2): Recognize exchange of registers as swaphi.
gcc/testsuite/ChangeLog
* gcc.target/xstormy16/bswap16.c: New test case.
* gcc.target/xstormy16/bswap32.c: Likewise.
* gcc.target/xstormy16/swpb.c: Likewise.
* gcc.target/xstormy16/swpw-1.c: Likewise.
* gcc.target/xstormy16/swpw-2.c: Likewise.
Richard Biener [Tue, 25 Apr 2023 14:38:44 +0000 (16:38 +0200)]
More last_stmt removal
This adjusts more users of last_stmt where it is clear that debug
stmt skipping is unnecessary. In most cases this also allowed
significant code simplification.
* config/riscv/vector.md: Refine vmadc/vmsbc RA constraint.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/narrow_constraint-13.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-14.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-15.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-16.c: New test.
Kewen Lin [Wed, 26 Apr 2023 05:21:14 +0000 (00:21 -0500)]
rs6000: Guard power9-vector for vsx_scalar_cmp_exp_qp_* [PR108758]
__builtin_vsx_scalar_cmp_exp_qp_{eq,gt,lt,unordered} used
to be guarded with condition TARGET_P9_VECTOR before new
bif framework was introduced (r12-5752-gd08236359eb229),
since r12-5752 they are placed under stanza ieee128-hw,
that is to check condition TARGET_FLOAT128_HW, it caused
test case float128-cmp2-runnable.c to fail at -m32 as the
condition TARGET_FLOAT128_HW isn't satisified with -m32.
By checking the commit history, I didn't see any notes on
why this condition change on them was made, so this patch
is to move these bifs from stanza ieee128-hw to stanza
power9-vector as before.
PR target/108758
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def
(__builtin_vsx_scalar_cmp_exp_qp_eq, __builtin_vsx_scalar_cmp_exp_qp_gt
__builtin_vsx_scalar_cmp_exp_qp_lt,
__builtin_vsx_scalar_cmp_exp_qp_unordered): Move from stanza ieee128-hw
to power9-vector.
Kewen Lin [Wed, 26 Apr 2023 05:21:05 +0000 (00:21 -0500)]
rs6000: Fix predicate for const vector in sldoi_to_mov [PR109069]
As PR109069 shows, commit r12-6537-g080a06fcb076b3 which
introduces define_insn_and_split sldoi_to_mov adopts
easy_vector_constant for const vector of interest, but it's
wrong since predicate easy_vector_constant doesn't guarantee
each byte in the const vector is the same. One counter
example is the const vector in pr109069-1.c. This patch is
to introduce new predicate const_vector_each_byte_same to
ensure all bytes in the given const vector are the same by
considering both int and float, meanwhile for the constants
which don't meet easy_vector_constant we need to gen a move
instead of just a set, and uses VECTOR_MEM_ALTIVEC_OR_VSX_P
rather than VECTOR_UNIT_ALTIVEC_OR_VSX_P for V2DImode support
under VSX since vector long long type of vec_sld is guarded
under stanza vsx.
PR target/109069
gcc/ChangeLog:
* config/rs6000/altivec.md (sldoi_to_mov<mode>): Replace predicate
easy_vector_constant with const_vector_each_byte_same, add
handlings in preparation for !easy_vector_constant, and update
VECTOR_UNIT_ALTIVEC_OR_VSX_P with VECTOR_MEM_ALTIVEC_OR_VSX_P.
* config/rs6000/predicates.md (const_vector_each_byte_same): New
predicate.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr109069-1.c: New test.
* gcc.target/powerpc/pr109069-2-run.c: New test.
* gcc.target/powerpc/pr109069-2.c: New test.
* gcc.target/powerpc/pr109069-2.h: New test.
RISC-V: Optimize comparison patterns for register allocation
Current RA constraint for RVV comparison instructions totall does not allow
registers between dest and source operand have any overlaps.
For example:
vmseq.vv vd, vs2, vs1
If LMUL = 8, vs2 = v8, vs1 = v16:
In current GCC RA constraint, GCC does not allow vd to be any regno in v8 ~ v23.
However, it is too conservative and not true according to RVV ISA.
Since the dest EEW of comparison is always EEW = 1, so it always follows the overlap
rules of Dest EEW < Source EEW. So in this case, we should allow GCC RA have the chance
to allocate v8 or v16 for vd, so that we can have better vector registers usage in RA.
* gcc.target/riscv/rvv/base/binop_vv_constraint-4.c: Adapt testcase.
* gcc.target/riscv/rvv/base/narrow_constraint-17.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-18.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-19.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-20.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-21.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-22.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-23.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-24.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-25.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-26.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-27.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-28.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-29.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-30.c: New test.
* gcc.target/riscv/rvv/base/narrow_constraint-31.c: New test.
Pan Li [Tue, 25 Apr 2023 14:29:04 +0000 (22:29 +0800)]
RISC-V: Bugfix for RVV vbool*_t vn_reference_equal
In most architecture the precision_size of vbool*_t types are caculated
like as the multiple of the type size. For example:
precision_size = type_size * 8 (aka, bit count per bytes).
Unfortunately, some architecture like RISC-V will adjust the
precision_size
for the vbool*_t in order to align the ISA. For example as below.
type_size = [1, 1, 1, 1, 2, 4, 8]
precision_size = [1, 2, 4, 8, 16, 32, 64]
Then the precision_size of RISC-V vbool*_t will not be the multiple of
the
type_size. This PATCH try to enrich this case when comparing the
vn_reference.
Given we have the below code:
void test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict
out) {
vbool8_t v1 = *(vbool8_t*)in;
vbool16_t v2 = *(vbool16_t*)in;
RISC-V: Add auto-vectorization compile option for RVV
This patch is adding 2 compile option for RVV auto-vectorization.
1. -param=riscv-autovec-preference=
This option is to specify the auto-vectorization approach for RVV.
Currently, we only support scalable and fixed-vlmax.
- scalable means VLA auto-vectorization. The vector-length to compiler is
unknown and runtime invariant. Such approach can allow us compile the code
run on any vector-length RVV CPU.
- fixed-vlmax means the compile known the RVV CPU vector-length, compile option
in fixed-length VLS auto-vectorization. Meaning if we specify vector-length=512.
The execution file can only run on vector-length = 512 RVV CPU.
- TODO: we may need to support min-length VLS auto-vectorization, means the execution
file can run on larger length RVV CPU.
2. -param=riscv-autovec-lmul=
Specify LMUL choosing for RVV auto-vectorization.
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add enum for
auto-vectorization preference.
(enum riscv_autovec_lmul_enum): Add enum for choosing LMUL of RVV
auto-vectorization.
* config/riscv/riscv.opt: Add compile option for RVV auto-vectorization.
avoid splitting small constants in bcrli_nottwobits patterns
I have noticed that in the case when we try to clear two bits through a
small constant,
and ZBS is enabled then GCC split it into two "andi" instructions.
For example for the following C code:
int foo(int a) {
return a & ~ 0x101;
}
GCC generates the following:
foo:
andi a0,a0,-2
andi a0,a0,-257
ret
but should be this one:
foo:
andi a0,a0,-258
ret
This patch solves the mentioned issue.
gcc/ChangeLog
* config/riscv/bitmanip.md: Updated predicates of bclri<mode>_nottwobits
and bclridisi_nottwobits patterns.
* config/riscv/predicates.md: (not_uimm_extra_bit_or_nottwobits): Adjust
predicate to avoid splitting arith constants.
(const_nottwobits_not_arith_operand): New predicate.
gcc/testsuite
* gcc.target/riscv/zbs-bclri-nottwobits.c: New test.
Gaius Mulley [Wed, 26 Apr 2023 01:55:59 +0000 (02:55 +0100)]
PR modula2/108121 Re-implement overflow detection for constant literals
This patch fixes the overflow detection for constant literals.
The ZTYPE is changed to int128 (or int64) if int128 is unavailable and
constant literals are built from widest_int. The widest_int is converted
into the tree type and checked for overflow.
m2expr_interpret_integer and append_m2_digit are removed.
gcc/m2/ChangeLog:
PR modula2/108121
* gm2-compiler/M2ALU.mod (Less): Reformatted.
* gm2-compiler/SymbolTable.mod (DetermineSizeOfConstant): Remove
from import.
(ConstantStringExceedsZType): Import.
(GetConstLitType): Re-implement using ConstantStringExceedsZType.
* gm2-gcc/m2decl.cc (m2decl_DetermineSizeOfConstant): Remove.
(m2decl_ConstantStringExceedsZType): New function.
(m2decl_BuildConstLiteralNumber): Re-implement.
* gm2-gcc/m2decl.def (DetermineSizeOfConstant): Remove.
(ConstantStringExceedsZType): New function.
* gm2-gcc/m2decl.h (m2decl_DetermineSizeOfConstant): Remove.
(m2decl_ConstantStringExceedsZType): New function.
* gm2-gcc/m2expr.cc (append_digit): Remove.
(m2expr_interpret_integer): Remove.
(append_m2_digit): Remove.
(m2expr_StrToWideInt): New function.
(m2expr_interpret_m2_integer): Remove.
* gm2-gcc/m2expr.def (CheckConstStrZtypeRange): New function.
* gm2-gcc/m2expr.h (m2expr_StrToWideInt): New function.
* gm2-gcc/m2type.cc (build_m2_word64_type_node): New function.
(build_m2_ztype_node): New function.
(m2type_InitBaseTypes): Call build_m2_ztype_node.
* gm2-lang.cc (gm2_type_for_size): Re-write using early returns.
gcc/testsuite/ChangeLog:
PR modula2/108121
* gm2/pim/fail/largeconst.mod: Increased constant value test
to fail now that cc1gm2 uses widest_int to represent a ZTYPE.
* gm2/pim/fail/largeconst2.mod: New test.
Patrick Palka [Tue, 25 Apr 2023 19:59:22 +0000 (15:59 -0400)]
c++: value dependence of by-ref lambda capture [PR108975]
We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses. Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.
We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.
To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time. It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P. This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.
PR c++/108975
gcc/cp/ChangeLog:
* pt.cc (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.
riscv: relax splitter restrictions for creating pseudos
[partial addressing of PR/109279]
RISCV splitters have restrictions to not create pesudos due to a combine
limitatation. And despite this being a split-during-combine limitation,
all split passes take the hit due to way define*_split are used in gcc.
With the original combine issue being fixed 61bee6aed2 ("combine: Don't
record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]")
the RV splitters can now be relaxed.
This improves the codegen in general. e.g.
long long f(void) { return 0x0101010101010101ull; }
Before
li a0,0x01010000
addi a0,0x0101
slli a0,a0,16
addi a0,a0,0x0101
slli a0,a0,16
addi a0,a0,0x0101
ret
With patch
li a5,0x01010000
addi a5,a5,0x0101
mv a0,a5
slli a5,a5,32
add a0,a5,a0
ret
This reduces the qemu icounts, even if slightly, across SPEC2017.
This came up as part of IRC chat on PR/109279 and was suggested by
Andrew Pinski.
gcc/ChangeLog:
* config/riscv/riscv.md: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
* config/riscv/riscv.cc: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
riscv_force_temporary() drop in_splitter arg.
* config/riscv/riscv-protos.h: riscv_move_integer() drop in_splitter arg.
riscv_split_symbol() drop in_splitter arg.
Eric Botcazou [Tue, 25 Apr 2023 15:38:31 +0000 (17:38 +0200)]
Avoid creating useless debug temporaries
insert_debug_temp_for_var_def has some strange code whereby it creates
debug temporaries for SINGLE_RHS (RHS for gimple_assign_single_p) but
not for other RHS in the same situation.
gcc/
* tree-ssa.cc (insert_debug_temp_for_var_def): Do not create
superfluous debug temporaries for single GIMPLE assignments.
Richard Biener [Tue, 25 Apr 2023 12:56:44 +0000 (14:56 +0200)]
tree-optimization/109609 - correctly interpret arg size in fnspec
By majority vote and a hint from the API name which is
arg_max_access_size_given_by_arg_p this interprets a memory access
size specified as given as other argument such as for strncpy
in the testcase which has "1cO313" as specifying the _maximum_
size read/written rather than the exact size. There are two
uses interpreting it that way already and one differing. The
following adjusts the differing and clarifies the documentation.
PR tree-optimization/109609
* attr-fnspec.h (arg_max_access_size_given_by_arg_p):
Clarify semantics.
* tree-ssa-alias.cc (check_fnspec): Correctly interpret
the size given by arg_max_access_size_given_by_arg_p as
maximum, not exact, size.
While OpenMP 5.0 required a single structured block before and after the
'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence,
denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or
more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but
till requires the '{' ... '}' for the final-loop-body) and updates Fortran
to accept zero or more than one statements.
If there is no preceeding or succeeding executable statement, a warning is
shown.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_resolve_omp_do_blocks): Handle zero
or more than one exec statements before/after 'omp scan'.
* trans-openmp.cc (gfc_trans_omp_do): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/scan-1.c: New test.
* testsuite/libgomp.c/scan-23.c: New test.
* testsuite/libgomp.fortran/scan-2.f90: New test.
Jakub Jelinek [Tue, 25 Apr 2023 14:00:48 +0000 (16:00 +0200)]
testsuite: Fix up ext-floating2.C on powerpc64-linux
Another testcase that is failing on powerpc64-linux. The test expects
a diagnostics when float64 && float128 or in another spot when
float32 && float128. Now, float128 effective target is satisfied on
powerpc64-linux, despite __CPP_FLOAT128_T__ not being defined, because
one needs to add some extra options for it. I think 32-bit arm has
similar case for float16.
2023-04-25 Jakub Jelinek <jakub@redhat.com>
* g++.dg/cpp23/ext-floating2.C: Add dg-add-options for
float16, float32, float64 and float128.
aarch64: Implement V2DI,V4SI division optabs for TARGET_SVE
Similar to the mulv2di case, we can use SVE instruction to implement the V4SI and V2DI optabs
for signed and unsigned integer division.
This allows us to generate much cleaner code for the testcase than the current:
food:
fmov x1, d1
fmov x0, d0
umov x2, v0.d[1]
sdiv x0, x0, x1
umov x1, v1.d[1]
sdiv x1, x2, x1
fmov d0, x0
ins v0.d[1], x1
ret
which now becomes:
food:
ptrue p0.b, all
sdiv z0.d, p0/m, z0.d, z1.d
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (<su_optab>div<mode>3): New define_expand.
* config/aarch64/iterators.md (VQDIV): New mode iterator.
(vnx2di): New mode attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve-neon-modes_3.c: New test.
Jakub Jelinek [Tue, 25 Apr 2023 12:38:01 +0000 (14:38 +0200)]
testsuite: Fix up ext-floating15.C tests on powerpc64-linux [PR109278]
I've noticed this test FAILs on powerpc64-linux, with
FAIL: g++.dg/cpp23/ext-floating15.C -std=gnu++98 (test for excess errors)
Excess errors:
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:1: error: variable or field 'bar' declared void
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target
/home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:6: error: expected primary-expression before '_Float128'
and similarly other std versions.
powerpc64-linux is float128 target, but needs to add some options for it.
Richard Biener [Mon, 24 Apr 2023 11:31:07 +0000 (13:31 +0200)]
rtl-optimization/109585 - alias analysis typo
When r10-514-gc6b84edb6110dd2b4fb improved access path analysis
it introduced a typo that triggers when there's an access to a
trailing array in the first access path leading to false
disambiguation.
Jakub Jelinek [Tue, 25 Apr 2023 12:20:51 +0000 (14:20 +0200)]
powerpc: Fix up *branch_anddi3_dot for -m32 -mpowerpc64 [PR109566]
The following testcase reduced from newlib ICEs on powerpc-linux,
with -O2 -m32 -mpowerpc64 since r12-6433 PR102239 optimization was
added and on the original testcase since some ranger improvements in
GCC 13 made it no longer latent on newlib.
The problem is that the *branch_anddi3_dot define_insn_and_split
relies on the *rotldi3_mask_dot define_insn_and_split being recognized
during splitting. The rs6000_is_valid_rotate_dot_mask function checks whether
the mask is a CONST_INT which is a valid mask, but *rotl<mode>3_mask_dot in
addition to checking that it is a valid mask also has
(<MODE>mode == Pmode || UINTVAL (operands[3]) <= 0x7fffffff)
test in the condition. For TARGET_64BIT that doesn't add any further
requirements, but for !TARGET_64BIT && TARGET_POWERPC64 if the AND
second operand is larger than INT_MAX it will not be recognized.
The rs6000_is_valid_rotate_dot_mask function is used solely in one spot,
condition of *branch_anddi3_dot, so the following patch adjusts it
to check for that as well.
2023-04-25 Jakub Jelinek <jakub@redhat.com>
PR target/109566
* config/rs6000/rs6000.cc (rs6000_is_valid_rotate_dot_mask): For
!TARGET_64BIT, don't return true if UINTVAL (mask) << (63 - nb)
is larger than signed int maximum.
Martin Liska [Thu, 6 Apr 2023 09:54:51 +0000 (11:54 +0200)]
gcov: add info about "calls" to JSON output format
gcc/ChangeLog:
* doc/gcov.texi: Document the new "calls" field and document
the API bump. Mention also "block_ids" for lines.
* gcov.cc (output_intermediate_json_line): Output info about
calls and extend branches as well.
(generate_results): Bump version to 2.
(output_line_details): Use block ID instead of a non-sensual
index.
gcc/testsuite/ChangeLog:
* g++.dg/gcov/gcov-17.C: Add call to a noreturn function.
* g++.dg/gcov/test-gcov-17.py: Cover new format.
* lib/gcov.exp: Add options for gcov that emit the extra info.
Roger Sayle [Tue, 25 Apr 2023 11:04:52 +0000 (12:04 +0100)]
[Committed] Correct zeroextendqihi2 insn length regression on xstormy16.
My recent tweak to the zeroextendqihi2 pattern on xstormy16 incorrectly
handled the case where the operand was a MEM. MEM operands use a longer
encoding than REG operands, and the incorrect instruction length resulted
in assembler errors (as reported by Jeff Law). This patch restores the
original length resolving this regression. Sorry for the inconvenience.
Committed as obvious, after testing that a cross-compiler to xstormy16-elf
builds from x86_64-pc-linux-gnu, and that gcc.c-torture/execute/memset-2.c
no longer causes "operand out of range" issues in gas. Committed as
obvious.
2023-04-25 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/stormy16/stormy16.md (zero_extendqihi2): Restore/fix
length attribute for the first (memory operand) alternative.
aarch64: Leveraging the use of STP instruction for vec_duplicate
The backend pattern for storing a pair of identical values in 32 and
64-bit modes with the machine instruction STP was missing, and
multiple instructions were needed to reproduce this behavior as a
result of failed RTL pattern match in combine pass.
For the test case:
typedef long long v2di __attribute__((vector_size (16)));
typedef int v2si __attribute__((vector_size (8)));
void
foo (v2di *x, long long a)
{
v2di tmp = {a, a};
*x = tmp;
}
void
foo2 (v2si *x, int a)
{
v2si tmp = {a, a};
*x = tmp;
}
at -O2 on aarch64 gives:
foo:
stp x1, x1, [x0]
ret
foo2:
stp w1, w1, [x0]
ret
instead of:
foo:
dup v0.2d, x1
str q0, [x0]
ret
foo2:
dup v0.2s, w1
str d0, [x0]
ret
Bootstrapped and regtested on aarch64-none-linux-gnu.
I think it's best to specify the default behavior of nan_state, since
it's not obvious that nan_state() defaults to TRUE. Also, this avoids
the ugly nan_state(false, false) idiom.
gcc/ChangeLog:
* value-range.cc (frange::set): Adjust constructor.
* value-range.h (nan_state::nan_state): Replace default
constructor with one taking an argument.
Eric Botcazou [Tue, 25 Apr 2023 08:46:16 +0000 (10:46 +0200)]
Remove obsolete configure code in gnattools
It was recently pointed out that we generate symbolic links to ghost files
when building the GNAT tools, as the mlib-tgt-specific-*.adb files are gone.
Aldy Hernandez [Mon, 21 Nov 2022 22:18:43 +0000 (23:18 +0100)]
Pass correct type to irange::contains_p() in ipa-cp.cc.
There is a call to contains_p() in ipa-cp.cc which passes incompatible
types. This currently works because deep in the call chain, the legacy
code uses tree_int_cst_lt which performs the operation with
widest_int. With the upcoming removal of legacy, contains_p() will be
stricter.
gcc/ChangeLog:
* ipa-cp.cc (ipa_range_contains_p): New.
(decide_whether_version_node): Use it.
Andrew Pinski [Tue, 25 Apr 2023 00:17:27 +0000 (17:17 -0700)]
Add alternative testcase of phi-opt-25.c that tests phiopt
Right now phi-opt-25.c has tests like `a ? func(a) : CST`
but if we add the simplifications to match.pd, then phi-opt-25.c
will no longer be testing phiopt to make sure these get optimized.
So this adds an alternative version which is designed to test
phiopt.
Committed as obvious after testing the testcase to make sure it does not
fail on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (is_combined_permutation_identity): Try to
simplify two successive VEC_PERM_EXPRs with same VLA mask,
where mask chooses elements in reverse order.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/rev-1.c: New test.
Patrick Palka [Mon, 24 Apr 2023 17:39:54 +0000 (13:39 -0400)]
libstdc++: Fix __max_diff_type::operator>>= for negative values
This patch fixes sign bit propagation when right-shifting a negative
__max_diff_type value by more than one, a bug that our existing test
coverage didn't expose until r14-159-g03cebd304955a6 fixed the front
end's 'signed typedef-name' handling that the test relies on (which is
a non-standard extension to the language grammar).
libstdc++-v3/ChangeLog:
* include/bits/max_size_type.h (__max_diff_type::operator>>=):
Fix propagation of sign bit.
* testsuite/std/ranges/iota/max_size_type.cc: Avoid using the
non-standard 'signed typedef-name'. Add some compile-time tests
for right-shifting a negative __max_diff_type value by more than
one.
Andrew Pinski [Fri, 21 Apr 2023 21:45:56 +0000 (14:45 -0700)]
PHIOPT: Add support for diamond shaped bb to match_simplify_replacement
This adds diamond shaped form of basic blocks to match_simplify_replacement.
This is the patch is the start of removing/moving all
of what minmax_replacement does to match.pd to reduce the code duplication.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note phi-opt-{23,24}.c testcase had an incorrect xfail as there should
have been 2 if still because f4/f5 would not be transformed as -ABS is
not allowable during early phi-opt.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement): Add new arguments
and support diamond shaped basic block form.
(tree_ssa_phiopt_worker): Update call to match_simplify_replacement
Andrew Pinski [Sun, 9 Apr 2023 22:47:50 +0000 (22:47 +0000)]
PHIOPT: Ignore predicates for match-and-simplify phi-opt
This fixes a missed optimization where early phi-opt would
not work when there was predicates. The easiest fix is
to change empty_bb_or_one_feeding_into_p to ignore those
statements while checking for only feeding statement.
Note phi-opt-23.c and phi-opt-24.c still fail as we don't handle
diamond form in match_and_simplify phiopt yet.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
Instead of calling last_and_only_stmt, look for the last statement
manually.
Andrew Pinski [Fri, 31 Mar 2023 17:29:26 +0000 (17:29 +0000)]
PHIOPT: Factor out some code from match_simplify_replacement
This factors out the code checking if we have an empty bb
or one statement that feeds into the phi so it can be used
when adding diamond shaped bb form to match_simplify_replacement
in the next patch. Also allows for some improvements
in the next patches too.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
New function.
(match_simplify_replacement): Call
empty_bb_or_one_feeding_into_p instead of doing it inline.
Andrew Pinski [Thu, 20 Apr 2023 17:56:17 +0000 (10:56 -0700)]
PHIOPT: Allow other diamond uses when do_hoist_loads is true
While working on adding diamond shaped form to match-and-simplify
phiopt, I Noticed that we would not reach there if do_hoist_loads
was true. In the original code before the cleanups it was not
obvious why but after I finished the cleanups, it was just a matter
of removing a continue and that is what this patch does.
This just happens also to fix a bug report that I noticed too.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/68894
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove the
continue for the do_hoist_loads diamond case.
Andrew Pinski [Thu, 20 Apr 2023 17:26:43 +0000 (10:26 -0700)]
PHIOPT: Cleanup tree_ssa_phiopt_worker code
This patch cleans up tree_ssa_phiopt_worker by merging
common code. Making do_store_elim handled earlier.
Note this does not change any overall logic of the code,
just moves code around enough to be able to do this.
This will make it easier to move code around even more
and a few other fixes I have.
Plus I think all of the do_store_elim code really
should move to its own function as how much code is shared
is now obvious not much.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Rearrange
code for better code readability.
Andrew Pinski [Thu, 20 Apr 2023 16:23:25 +0000 (09:23 -0700)]
PHIOPT: Move check on diamond bb to tree_ssa_phiopt_worker from minmax_replacement
This moves the check to make sure on the diamond shaped form bbs that
the the two middle bbs are only for that diamond shaped form earlier
in the shared code.
Also remove the redundant check for single_succ_p since that was already
done before hand.
The next patch will simplify the code even further and remove redundant
checks.
PR tree-optimization/109604
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Move the
diamond form check from ...
(minmax_replacement): Here.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109604-1.c: New test.
* gcc.c-torture/compile/pr109604-2.c: New test.
Patrick Palka [Mon, 24 Apr 2023 14:33:49 +0000 (10:33 -0400)]
c++, tree: declare some basic functions inline
The functions strip_array_types, is_typedef_decl, typedef_variant_p
and cp_expr_location are used throughout the C++ front end including in
some fairly hot parts (e.g. in the tsubst routines and cp_walk_subtree)
and they're small enough that the overhead of calling them out-of-line
is relatively significant.
So this patch moves their definitions into the appropriate headers to
enable inlining them.
Motivated by a recent LLVM patch I saw, we can use SVE for 64-bit vector integer MUL (plain Advanced SIMD doesn't support it).
Since the Advanced SIMD regs are just the low 128-bit part of the SVE regs it all works transparently.
It's a reasonably straightforward implementation of the mulv2di3 optab that wires it up through the mulvnx2di3 expander and
subregs the results back to the Advanced SIMD modes.
There's more such tricks possible with other operations (and we could do 64-bit multiply-add merged operations too) but for now
this self-contained patch improves the mul case as without it for the testcases in the patch we'd have scalarised the arguments,
moved them to GP regs, performed two GP MULs and moved them back to SIMD regs.
Advertising a mulv2di3 optab from the backend should also allow for more flexibile vectorisation opportunities.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (mulv2di3): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve-neon-modes_1.c: New test.
* gcc.target/aarch64/sve-neon-modes_2.c: New test.
install.texi needs some updates for GCC 13 and trunk:
* We used a mixture of Solaris 2 and Solaris references. Since Solaris
1/SunOS 4 is ancient history by now, consistently use Solaris
everywhere. Likewise, explicit references to Solaris 11 can go in
many places since Solaris 11.3 and 11.4 is all GCC supports.
* Some caveats apply to both Solaris/SPARC and x86, like the difference
between as and gas.
* Some specifics are obsolete, like the /usr/ccs/bin path whose contents
was merged into /usr/bin in Solaris 11.0 already. Likewise, /bin/sh
is ksh93 since Solaris 11.0, so there's no need to explicitly use
/bin/ksh.
* I've removed the reference to OpenCSW: there's barely a need for external
sites to get additional packages. OpenCSW is mostly unmaintained these
days and has been found to be rather harmful then helping.
* The section on assembler and linker to use was partially duplicated.
Better keep the info in one place.
* GNAT is bundled in recent Solaris 11.4 updates, so recommend that.
Tested on i386-pc-solaris2.11 with make doc/gccinstall.{info,pdf} and
inspection of the latter.
aarch64: PR target/109406 Add support for SVE2 unpredicated MUL
SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders
without using up a predicate registers. This patch does so.
As the SVE MUL expansion currently is templated away through a code iterator I did not split it
off just for this case but instead special-cased it in the define_expand. It seemed somewhat less
invasive than the alternatives but I could split it off more explicitly if others want to.
The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
PR target/109406
* config/aarch64/aarch64-sve.md (<optab><mode>3): Handle TARGET_SVE2 MUL
case.
* config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_<mode>): New
pattern.
gcc/testsuite/ChangeLog:
PR target/109406
* gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2
MUL.
* gcc.target/aarch64/sve2/unpred_mul_1.c: New test.
[4/4] aarch64: Convert UABAL2 and SABAL2 patterns to standard RTL codes
The final patch in the series tackles the most complex of this family of patterns, UABAL2 and SABAL2.
These extract the high part of the sources, perform an absdiff on them, widen the result and accumulate.
The motivating testcase for this patch (series) is included and the simplification required doesn't actually
trigger with just the RTL pattern change because rtx_costs block it.
So this patch also extends rtx costs to recognise the (minus (smax (x, y) (smin (x, y)))) expression we use
to describe absdiff in the backend and avoid recursing into its arms.
This allows us to generate the single-instruction sequence expected here.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>abal2<mode>): Rename to...
(aarch64_<su>abal2<mode>_insn): ... This. Use RTL codes instead of unspec.
(aarch64_<su>abal2<mode>): New define_expand.
* config/aarch64/aarch64.cc (aarch64_abd_rtx_p): New function.
(aarch64_rtx_costs): Handle ABD rtxes.
* config/aarch64/aarch64.md (UNSPEC_SABAL2, UNSPEC_UABAL2): Delete.
* config/aarch64/iterators.md (ABAL2): Delete.
(sur): Remove handling of UNSPEC_UABAL2 and UNSPEC_SABAL2.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/vabal_combine.c: New test.
[3/4] aarch64: Convert UABAL and SABAL patterns to standard RTL codes
With the SABDL and UABDL patterns converted, the accumulating forms of them UABAL and SABAL are not much more complicated.
There's an accumulator argument that we, err, accumulate into with a PLUS once all the widening is done.
Some necessary renaming of patterns relating to the removal of UNSPEC_SABAL and UNSPEC_UABAL is included.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>): Rename to...
(aarch64_<su>abal<mode>): ... This. Use RTL codes instead of unspec.
(<sur>sadv16qi): Rename to...
(<su>sadv16qi): ... This. Adjust for the above.
* config/aarch64/aarch64-sve.md (<sur>sad<vsi2qi>): Rename to...
(<su>sad<vsi2qi>): ... This. Adjust for the above.
* config/aarch64/aarch64.md (UNSPEC_SABAL, UNSPEC_UABAL): Delete.
* config/aarch64/iterators.md (ABAL): Delete.
(sur): Remove handling of UNSPEC_SABAL and UNSPEC_UABAL.
[2/4] aarch64: Convert UABDL2 and SABDL2 patterns to standard RTL codes
Similar to the previous patch for UABDL and SABDL, this patch covers the *2 versions that vec_select the high half
of its input to do the asbsdiff and extend. A define_expand is added for the intrinsic to create the "select-high-half" RTX the pattern expects.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>): Rename to...
(aarch64_<su>abdl2<mode>_insn): ... This. Use RTL codes instead of unspec.
(aarch64_<su>abdl2<mode>): New define_expand.
* config/aarch64/aarch64.md (UNSPEC_SABDL2, UNSPEC_UABDL2): Delete.
* config/aarch64/iterators.md (ABDL2): Delete.
(sur): Remove handling of UNSPEC_SABDL2 and UNSPEC_UABDL2.
[1/4] aarch64: Convert UABDL and SABDL patterns to standard RTL codes
This is the first patch in a series to improve the RTL representation of the sum-of-absolute-differences patterns
in the backend. We can use standard RTL codes and remove some unspecs.
For UABDL and SABDL we have a widening of the result so we can represent uabdl (x, y) as (zero_extend (minus (smax (x, y) (smin (x, y)))))
and sabdl (x, y) as (zero_extend (minus (umax (x, y) (umin (x, y))))).
It is important to use zero_extend rather than sign_extend for the sabdl case, as the result of the absolute difference is still a positive unsigned value
(the signedness of the operation refers to the values being diffed, not the absolute value of the difference) that must be zero-extended.
Bootstrapped and tested on aarch64-none-linux-gnu (these intrinsics are reasonably well-covered by the advsimd-intrinsics.exp tests)
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>abdl<mode>): Rename to...
(aarch64_<su>abdl<mode>): ... This. Use standard RTL ops instead of
unspec.
* config/aarch64/aarch64.md (UNSPEC_SABDL, UNSPEC_UABDL): Delete.
* config/aarch64/iterators.md (ABDL): Delete.
(sur): Remove handling of UNSPEC_SABDL and UNSPEC_UABDL.
aarch64: Add pattern to match zero-extending scalar result of ADDLV
The vaddlv_u8 and vaddlv_u16 intrinsics produce a widened scalar result (uint16_t and uint32_t).
The ADDLV instructions themselves zero the rest of the V register, which gives us a free zero-extension
to 32 and 64 bits, similar to how it works on the GP reg side.
Because we don't model that zero-extension in the machine description this can cause GCC to move the
results of these instructions to the GP regs just to do a (superfluous) zero-extension.
This patch just adds a pattern to catch these cases. For the testcases we can now generate no zero-extends
or GP<->FP reg moves, whereas before we generated stuff like:
foo_8_32:
uaddlv h0, v0.8b
umov w1, v0.h[0] // FP<->GP move with zero-extension!
str w1, [x0]
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): New pattern.
Richard Biener [Tue, 18 Apr 2023 15:26:57 +0000 (17:26 +0200)]
This replaces uses of last_stmt where we do not require debug skipping
There are quite some cases which want to access the control stmt
ending a basic-block. Since there cannot be debug stmts after
such stmt there's no point in using last_stmt which skips debug
stmts and can be a compile-time hog for larger testcases.