]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
2 years agoLoongArch: Fix MUSL_DYNAMIC_LINKER
Peng Fan [Wed, 19 Apr 2023 08:23:42 +0000 (16:23 +0800)] 
LoongArch: Fix MUSL_DYNAMIC_LINKER

The system based on musl has no '/lib64', so change it.

https://wiki.musl-libc.org/guidelines-for-distributions.html,
"Multilib/multi-arch" section of this introduces it.

gcc/
* config/loongarch/gnu-user.h (MUSL_DYNAMIC_LINKER): Redefine.

Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Suggested-by: Xi Ruoyao <xry111@xry111.site>
2 years agoRISC-V: Add local user vsetvl instruction elimination [PR109547]
Juzhe-Zhong [Fri, 7 Apr 2023 01:34:13 +0000 (09:34 +0800)] 
RISC-V: Add local user vsetvl instruction elimination [PR109547]

This patch is to enhance optimization for auto-vectorization.

Before this patch:

Loop:
vsetvl a5,a2...
vsetvl zero,a5...
vle

After this patch:

Loop:
vsetvl a5,a2
vle

gcc/ChangeLog:

PR target/109547
* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New function.
(vector_insn_info::skip_avl_compatible_p): Ditto.
(vector_insn_info::merge): Remove default value.
(pass_vsetvl::compute_local_backward_infos): Ditto.
(pass_vsetvl::cleanup_insns): Add local vsetvl elimination.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

PR target/109547
* gcc.target/riscv/rvv/vsetvl/pr109547.c: New.
* gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Update scan
condition.

2 years agoDaily bump.
GCC Administrator [Fri, 21 Apr 2023 00:17:31 +0000 (00:17 +0000)] 
Daily bump.

2 years agoupdate_web_docs_git: Allow setting TEXI2*, add git build default
Arsen Arsenović [Thu, 6 Apr 2023 10:20:57 +0000 (12:20 +0200)] 
update_web_docs_git: Allow setting TEXI2*, add git build default

maintainer-scripts/ChangeLog:

* update_web_docs_git: Add a mechanism to override makeinfo,
texi2dvi and texi2pdf, and default them to
/home/gccadmin/texinfo/install-git/bin/${tool}, if present.

2 years agoc++: simplify TEMPLATE_TYPE_PARM level lowering
Patrick Palka [Thu, 20 Apr 2023 19:16:59 +0000 (15:16 -0400)] 
c++: simplify TEMPLATE_TYPE_PARM level lowering

1. Don't bother recursing when level lowering a cv-qualified type
   template parameter.
2. Get rid of the recursive loop breaker when level lowering a
   constrained auto, and enable the TEMPLATE_PARM_DESCENDANTS cache in
   this case too.  This should be safe to do so now that we no longer
   substitute constraints on an auto.

gcc/cp/ChangeLog:

* pt.cc (tsubst) <case TEMPLATE_TYPE_PARM>: Don't recurse when
level lowering a cv-qualified type template parameter.  Remove
recursive loop breaker in the level lowering case for constrained
autos.  Use the TEMPLATE_PARM_DESCENDANTS cache in this case as
well.

2 years agoc++: use TREE_VEC for trailing args of variadic built-in traits
Patrick Palka [Thu, 20 Apr 2023 19:00:06 +0000 (15:00 -0400)] 
c++: use TREE_VEC for trailing args of variadic built-in traits

This patch makes us use TREE_VEC instead of TREE_LIST to represent the
trailing arguments of a variadic built-in trait.  These built-ins are
typically passed a simple pack expansion as the second argument, e.g.

   __is_constructible(T, Ts...)

and the main benefit of this representation change is that substituting
into this argument list is now basically free since tsubst_template_args
makes sure we reuse the TREE_VEC of the corresponding ARGUMENT_PACK when
expanding such a pack expansion.  In the previous TREE_LIST representation
we would need need to convert the expanded pack expansion into a TREE_LIST
(via tsubst_tree_list).

Note that an empty set of trailing arguments is now represented as an
empty TREE_VEC instead of NULL_TREE, so now TRAIT_TYPE/EXPR_TYPE2 will
be empty only for unary traits.

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Convert a TREE_VEC
of arguments into a TREE_LIST for sake of pretty printing.
* cxx-pretty-print.cc (pp_cxx_trait): Handle TREE_VEC
instead of TREE_LIST of trailing variadic trait arguments.
* method.cc (constructible_expr): Likewise.
(is_xible_helper): Likewise.
* parser.cc (cp_parser_trait): Represent trailing variadic trait
arguments as a TREE_VEC instead of TREE_LIST.
* pt.cc (value_dependent_expression_p): Handle TREE_VEC
instead of TREE_LIST of trailing variadic trait arguments.
* semantics.cc (finish_type_pack_element): Likewise.
(check_trait_type): Likewise.

2 years agoc++: make strip_typedefs generalize strip_typedefs_expr
Patrick Palka [Thu, 20 Apr 2023 19:00:04 +0000 (15:00 -0400)] 
c++: make strip_typedefs generalize strip_typedefs_expr

Currently if we have a TREE_VEC of types that we want to strip of typedefs,
we unintuitively need to call strip_typedefs_expr instead of strip_typedefs
since only strip_typedefs_expr handles TREE_VEC, and it also dispatches
to strip_typedefs when given a type.  But this seems backwards: arguably
strip_typedefs_expr should be the more specialized function, which
strip_typedefs dispatches to (and thus generalizes).

So this patch makes strip_typedefs subsume strip_typedefs_expr rather
than vice versa, which allows for some simplifications.

gcc/cp/ChangeLog:

* tree.cc (strip_typedefs): Move TREE_LIST handling to
strip_typedefs_expr.  Dispatch to strip_typedefs_expr for
non-type 't'.
<case TYPENAME_TYPE>: Remove manual dispatching to
strip_typedefs_expr.
<case TRAIT_TYPE>: Likewise.
(strip_typedefs_expr): Replaces calls to strip_typedefs_expr
with strip_typedefs throughout.  Don't dispatch to strip_typedefs
for type 't'.
<case TREE_LIST>: Replace this with the better version from
strip_typedefs.

2 years agodoc: Remove repeated word (typo)
Alejandro Colomar [Thu, 20 Apr 2023 17:18:55 +0000 (19:18 +0200)] 
doc: Remove repeated word (typo)

gcc/ChangeLog:

* doc/extend.texi (Common Function Attributes): Remove duplicate
word.

Signed-off-by: Alejandro Colomar <alx@kernel.org>
2 years agoDo not ignore UNDEFINED ranges when determining PHI equivalences.
Andrew MacLeod [Thu, 20 Apr 2023 17:10:40 +0000 (13:10 -0400)] 
Do not ignore UNDEFINED ranges when determining PHI equivalences.

Do not ignore UNDEFINED name arguments when registering two-way equivalences
from PHIs.

PR tree-optimization/109564
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi): Do no ignore
UNDEFINED range names when deciding if all PHI arguments are the same,

gcc/testsuite/
* gcc.dg/torture/pr109564-1.c: New testcase.
* gcc.dg/torture/pr109564-2.c: Likewise.
* gcc.dg/tree-ssa/evrp-ignore.c: XFAIL.
* gcc.dg/tree-ssa/vrp06.c: Likewise.

2 years agotree-vect-patterns: One small vect_recog_ctz_ffs_pattern tweak [PR109011]
Jakub Jelinek [Thu, 20 Apr 2023 17:44:27 +0000 (19:44 +0200)] 
tree-vect-patterns: One small vect_recog_ctz_ffs_pattern tweak [PR109011]

I've noticed I've made a typo, ifn in this function this late
is always only IFN_CTZ or IFN_FFS, never IFN_CLZ.

Due to this typo, we weren't using the originally intended
.CTZ (X) = .POPCOUNT ((X - 1) & ~X)
but
.CTZ (X) = PREC - .POPCOUNT (X | -X)
instead when we want to emit __builtin_ctz*/.CTZ using .POPCOUNT.
Both compute the same value, both are defined at 0 with the
same value (PREC), both have same number of GIMPLE statements,
but I think the former ought to be preferred, because lots of targets
have andn as a single operation rather than two, and also putting
a -1 constant into a vector register is often cheaper than vector
with broadcast PREC power of two value.

2023-04-20  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): Use
.CTZ (X) = .POPCOUNT ((X - 1) & ~X) in preference to
.CTZ (X) = PREC - .POPCOUNT (X | -X).

2 years agoc: Avoid -Wenum-int-mismatch warning for redeclaration of builtin acc_on_device ...
Jakub Jelinek [Thu, 20 Apr 2023 17:26:17 +0000 (19:26 +0200)] 
c: Avoid -Wenum-int-mismatch warning for redeclaration of builtin acc_on_device [PR107041]

The new -Wenum-int-mismatch warning triggers with -Wsystem-headers in
<openacc.h>, for obvious reasons the builtin acc_on_device uses int
type argument rather than enum which isn't defined yet when the builtin
is created, while the OpenACC spec requires it to have acc_device_t
enum argument.  The header makes sure it has int underlying type by using
negative and __INT_MAX__ enumerators.

I've tried to make the builtin typegeneric or just varargs, but that
changes behavior e.g. when one calls it with some C++ class which has
cast operator to acc_device_t, so the following patch instead disables
the warning for this builtin.

2023-04-20  Jakub Jelinek  <jakub@redhat.com>

PR c/107041
* c-decl.cc (diagnose_mismatched_decls): Avoid -Wenum-int-mismatch
warning on acc_on_device declaration.

* gcc.dg/goacc/pr107041.c: New test.

2 years ago[LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in asm in diffe...
Vladimir N. Makarov [Thu, 20 Apr 2023 14:02:13 +0000 (10:02 -0400)] 
[LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in asm in different mode

See gcc.c-torture/execute/20030222-1.c.  Consider the code for 32-bit (e.g. BE) target:
  int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x));
We generate the following RTL with reload insns:
  1. subreg:si(x:di, 0) = 0;
  2. subreg:si(x:di, 4) = v:si;
  3. t:di = x:di, dead x;
  4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di))
  5. i:si = subreg:si(t:di,4);
If we assign hard reg of x to t, dead code elimination will remove insn #2
and we will use unitialized hard reg.  So exclude the hard reg of x for t.
We could ignore this problem for non-empty asm using all x value but it is hard to
check that the asm are expanded into insn realy using x and setting r.
The old reload pass used the same approach.

gcc/ChangeLog

* lra-constraints.cc (match_reload): Exclude some hard regs for
multi-reg inout reload pseudos used in asm in different mode.

2 years agoarch: Use VIRTUAL_REGISTER_P predicate.
Uros Bizjak [Thu, 20 Apr 2023 15:00:24 +0000 (17:00 +0200)] 
arch: Use VIRTUAL_REGISTER_P predicate.

gcc/ChangeLog:

* config/arm/arm.cc (thumb1_legitimate_address_p):
Use VIRTUAL_REGISTER_P predicate.
(arm_eliminable_register): Ditto.
* config/avr/avr.md (push<mode>_1): Ditto.
* config/bfin/predicates.md (register_no_elim_operand): Ditto.
* config/h8300/predicates.md (register_no_sp_elim_operand): Ditto.
* config/i386/predicates.md (register_no_elim_operand): Ditto.
* config/iq2000/predicates.md (call_insn_operand): Ditto.
* config/microblaze/microblaze.h (CALL_INSN_OP): Ditto.

2 years agoi386: Handle sign-extract for QImode operations with high registers [PR78952]
Uros Bizjak [Thu, 20 Apr 2023 14:51:56 +0000 (16:51 +0200)] 
i386: Handle sign-extract for QImode operations with high registers [PR78952]

Introduce extract_operator predicate to handle both, zero-extract and
sign-extract extract operations with expressions like:

    (subreg:QI
      (zero_extract:SWI248
        (match_operand 1 "int248_register_operand" "0")
(const_int 8)
(const_int 8)) 0)

As shown in the testcase, this will enable generation of QImode
instructions with high registers when signed arguments are used.

gcc/ChangeLog:

PR target/78952
* config/i386/predicates.md (extract_operator): New predicate.
* config/i386/i386.md (any_extract): Remove code iterator.
(*cmpqi_ext<mode>_1_mem_rex64): Use extract_operator predicate.
(*cmpqi_ext<mode>_1): Ditto.
(*cmpqi_ext<mode>_2): Ditto.
(*cmpqi_ext<mode>_3_mem_rex64): Ditto.
(*cmpqi_ext<mode>_3): Ditto.
(*cmpqi_ext<mode>_4): Ditto.
(*extzvqi_mem_rex64): Ditto.
(*extzvqi): Ditto.
(*insvqi_2): Ditto.
(*extendqi<SWI24:mode>_ext_1): Ditto.
(*addqi_ext<mode>_0): Ditto.
(*addqi_ext<mode>_1): Ditto.
(*addqi_ext<mode>_2): Ditto.
(*subqi_ext<mode>_0): Ditto.
(*subqi_ext<mode>_2): Ditto.
(*testqi_ext<mode>_1): Ditto.
(*testqi_ext<mode>_2): Ditto.
(*andqi_ext<mode>_0): Ditto.
(*andqi_ext<mode>_1): Ditto.
(*andqi_ext<mode>_1_cc): Ditto.
(*andqi_ext<mode>_2): Ditto.
(*<any_or:code>qi_ext<mode>_0): Ditto.
(*<any_or:code>qi_ext<mode>_1): Ditto.
(*<any_or:code>qi_ext<mode>_2): Ditto.
(*xorqi_ext<mode>_1_cc): Ditto.
(*negqi_ext<mode>_2): Ditto.
(*ashlqi_ext<mode>_2): Ditto.
(*<any_shiftrt:insn>qi_ext<mode>_2): Ditto.

gcc/testsuite/ChangeLog:

PR target/78952
* gcc.target/i386/pr78952-4.c: New test.

2 years ago[PR target/108248] [RISC-V] Break down some bitmanip insn types
Raphael Zinsly [Thu, 20 Apr 2023 14:48:08 +0000 (08:48 -0600)] 
[PR target/108248] [RISC-V] Break down some bitmanip insn types

This is primarily Raphael's work.  All I did was adjust it to apply to the
trunk and add the new types to generic.md's scheduling model.

The basic idea here is to make sure we have the ability to schedule the
bitmanip instructions with a finer degree of control.  Some of the bitmanip
instructions are likely to have differing scheduler characteristics across
different implementations.

So rather than assign these instructions a generic "bitmanip" type, this
patch assigns them a type based on their RTL code by using the <bitmanip_insn>
iterator for the type.  Naturally we have to add a few new types.  It affects
clz, ctz, cpop, min, max.

We didn't do this for things like shNadd, single bit manipulation, etc. We
certainly could if the needs presents itself.

I threw all the new types into the generic_alu bucket in the generic
scheduling model.  Seems as good a place as any. Someone who knows the
sifive uarch should probably add these types (and bitmanip) to the sifive
scheduling model.

We also noticed that the recently added orc.b didn't have a type at all.
So we added it as a generic bitmanip type.

This has been bootstrapped in a gcc-12 base and I've built and run the
testsuite without regressions on the trunk.

Given it was primarily Raphael's work I could probably approve & commit it.
But I'd like to give the other RISC-V folks a chance to chime in.

PR target/108248
gcc/
* config/riscv/bitmanip.md (clz, ctz, pcnt, min, max patterns): Use
<bitmanip_insn> as the type to allow for fine grained control of
scheduling these insns.
* config/riscv/generic.md (generic_alu): Add bitmanip, clz, ctz, pcnt,
min, max.
* config/riscv/riscv.md (type attribute): Add types for clz, ctz,
pcnt, signed and unsigned min/max.

2 years agoRISC-V: Fix RVV register order
Juzhe-Zhong [Fri, 24 Mar 2023 06:57:25 +0000 (14:57 +0800)] 
RISC-V: Fix RVV register order

This patch fixes the issue of incorrect reigster order of RVV.
The new register order is coming from kito original RVV GCC implementation.

Consider this case:
void f (void *base,void *base2,void *out,size_t vl, int n)
{
    vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl);
    for (int i = 0; i < n; i++){
      vbool8_t m = __riscv_vlm_v_b8 (base + i, vl);
      vuint64m8_t v = __riscv_vluxei64_v_u64m8_m(m,base,bindex,vl);
      vuint64m8_t v2 = __riscv_vle64_v_u64m8_tu (v, base2 + i, vl);
      vint8m1_t v3 = __riscv_vluxei64_v_i8m1_m(m,base,v,vl);
      vint8m1_t v4 = __riscv_vluxei64_v_i8m1_m(m,base,v2,vl);
      __riscv_vse8_v_i8m1 (out + 100*i,v3,vl);
      __riscv_vse8_v_i8m1 (out + 222*i,v4,vl);
    }
}

Before this patch:
f:
csrr    t0,vlenb
slli    t1,t0,3
sub     sp,sp,t1
addi    a5,a0,100
vsetvli zero,a3,e64,m8,ta,ma
vle64.v v24,0(a5)
vs8r.v  v24,0(sp)
ble     a4,zero,.L1
mv      a6,a0
add     a4,a4,a0
mv      a5,a2
.L3:
vsetvli zero,zero,e64,m8,ta,ma
vl8re64.v       v24,0(sp)
vlm.v   v0,0(a6)
vluxei64.v      v24,(a0),v24,v0.t
addi    a6,a6,1
vsetvli zero,zero,e8,m1,tu,ma
vmv8r.v v16,v24
vluxei64.v      v8,(a0),v24,v0.t
vle64.v v16,0(a1)
vluxei64.v      v24,(a0),v16,v0.t
vse8.v  v8,0(a2)
vse8.v  v24,0(a5)
addi    a1,a1,1
addi    a2,a2,100
addi    a5,a5,222
bne     a4,a6,.L3
.L1:
csrr    t0,vlenb
slli    t1,t0,3
add     sp,sp,t1
jr      ra

After this patch:
f:
addi    a5,a0,100
vsetvli zero,a3,e64,m8,ta,ma
vle64.v v24,0(a5)
ble     a4,zero,.L1
mv      a6,a0
add     a4,a4,a0
mv      a5,a2
.L3:
vsetvli zero,zero,e64,m8,ta,ma
vlm.v   v0,0(a6)
addi    a6,a6,1
vluxei64.v      v8,(a0),v24,v0.t
vsetvli zero,zero,e8,m1,tu,ma
vmv8r.v v16,v8
vluxei64.v      v2,(a0),v8,v0.t
vle64.v v16,0(a1)
vluxei64.v      v1,(a0),v16,v0.t
vse8.v  v2,0(a2)
vse8.v  v1,0(a5)
addi    a1,a1,1
addi    a2,a2,100
addi    a5,a5,222
bne     a4,a6,.L3
.L1:
ret

The redundant register spillings is eliminated.
However, there is one more issue need to be addressed which is the redundant
move instruction "vmv8r.v". This is another story, and it will be fixed by another
patch (Fine tune RVV machine description RA constraint).

gcc/ChangeLog:

* config/riscv/riscv.h (enum reg_class): Fix RVV register order.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/spill-4.c: Adapt testcase.
* gcc.target/riscv/rvv/base/spill-6.c: Adapt testcase.
* gcc.target/riscv/rvv/base/reg_order-1.c: New test.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Co-authored-by: kito-cheng <kito.cheng@sifive.com>
2 years agoRISC-V: Fix riscv/arch-19.c with different ISA spec version
Kito Cheng [Thu, 20 Apr 2023 13:15:37 +0000 (21:15 +0800)] 
RISC-V: Fix riscv/arch-19.c with different ISA spec version

In newer ISA spec, F will implied zicsr, add that into -march option to
prevent different test result on different default -misa-spec version.

gcc/testsuite/

* gcc.target/riscv/arch-19.c: Add -misa-spec.

2 years agoRISC-V: Fix wrong check of register occurrences [PR109535]
Ju-Zhe Zhong [Wed, 19 Apr 2023 10:41:51 +0000 (18:41 +0800)] 
RISC-V: Fix wrong check of register occurrences [PR109535]

count_occurrences will conly count same RTX (same code and same mode),
but what we want to track is the occurrence of a register, a register
might appeared in the insn with different mode or contain in SUBREG.

Testcase coming from Kito.

gcc/ChangeLog:

PR target/109535
* config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New function.
(pass_vsetvl::cleanup_insns): Fix bug.

gcc/testsuite/ChangeLog:

PR target/109535
* g++.target/riscv/rvv/base/pr109535.C: New test.
* gcc.target/riscv/rvv/base/pr109535.c: New test.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Co-authored-by: kito-cheng <kito.cheng@sifive.com>
2 years agoRISC-V: Fix simplify_ior_optimization.c on rv32
Kito Cheng [Thu, 20 Apr 2023 12:03:24 +0000 (20:03 +0800)] 
RISC-V: Fix simplify_ior_optimization.c on rv32

GCC will complaint if target ABI isn't have corresponding multi-lib on
glibc toolchain, use stdint-gcc.h to suppress that.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/simplify_ior_optimization.c: Use stdint-gcc.h
rather than stdint.h

2 years agoamdgcn: bug fix ldexp insn
Andrew Stubbs [Thu, 20 Apr 2023 10:11:13 +0000 (11:11 +0100)] 
amdgcn: bug fix ldexp insn

The vop3 instructions don't support B constraint immediates.
Also, take the use the SV_FP iterator to delete a redundant pattern.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (vnsi, VnSI): Add scalar modes.
(ldexp<mode>3): Delete.
(ldexp<mode>3<exec>): Change "B" to "A".

2 years agoamdgcn: update target-supports.exp
Andrew Stubbs [Tue, 18 Apr 2023 11:03:43 +0000 (12:03 +0100)] 
amdgcn: update target-supports.exp

The backend can now vectorize more things.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_call_copysignf): Add amdgcn.
(check_effective_target_vect_call_sqrtf): Add amdgcn.
(check_effective_target_vect_call_ceilf): Add amdgcn.
(check_effective_target_vect_call_floor): Add amdgcn.
(check_effective_target_vect_logical_reduc): Add amdgcn.

2 years agotree: Add 3+ argument fndecl_built_in_p
Jakub Jelinek [Thu, 20 Apr 2023 11:02:52 +0000 (13:02 +0200)] 
tree: Add 3+ argument fndecl_built_in_p

On Wed, Feb 22, 2023 at 09:52:06AM +0000, Richard Biener wrote:
> > The following testcase ICEs because we still have some spots that
> > treat BUILT_IN_UNREACHABLE specially but not BUILT_IN_UNREACHABLE_TRAP
> > the same.

This patch uses (fndecl_built_in_p (node, BUILT_IN_UNREACHABLE)
                 || fndecl_built_in_p (node, BUILT_IN_UNREACHABLE_TRAP))
a lot and from grepping around, we do something like that in lots of
other places, or in some spots instead as
(fndecl_built_in_p (node, BUILT_IN_NORMAL)
 && (DECL_FUNCTION_CODE (node) == BUILT_IN_WHATEVER1
     || DECL_FUNCTION_CODE (node) == BUILT_IN_WHATEVER2))
The following patch adds an overload for this case, so we can write
it in a shorter way, using C++11 argument packs so that it supports
as many codes as one needs.

2023-04-20  Jakub Jelinek  <jakub@redhat.com>
    Jonathan Wakely  <jwakely@redhat.com>

* tree.h (built_in_function_equal_p): New helper function.
(fndecl_built_in_p): Turn into variadic template to support
1 or more built_in_function arguments.
* builtins.cc (fold_builtin_expect): Use 3 argument fndecl_built_in_p.
* gimplify.cc (goa_stabilize_expr): Likewise.
* cgraphclones.cc (cgraph_node::create_clone): Likewise.
* ipa-fnsummary.cc (compute_fn_summary): Likewise.
* omp-low.cc (setjmp_or_longjmp_p): Likewise.
* cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee,
cgraph_update_edges_for_call_stmt_node,
cgraph_edge::verify_corresponds_to_fndecl,
cgraph_node::verify_node): Likewise.
* tree-stdarg.cc (optimize_va_list_gpr_fpr_size): Likewise.
* gimple-ssa-warn-access.cc (matching_alloc_calls_p): Likewise.
* ipa-prop.cc (try_make_edge_direct_virtual_call): Likewise.

2 years agotree-vect-patterns: Pattern recognize ctz or ffs using clz, popcount or ctz [PR109011]
Jakub Jelinek [Thu, 20 Apr 2023 09:55:16 +0000 (11:55 +0200)] 
tree-vect-patterns: Pattern recognize ctz or ffs using clz, popcount or ctz [PR109011]

The following patch allows to vectorize __builtin_ffs*/.FFS even if
we just have vector .CTZ support, or __builtin_ffs*/.FFS/__builtin_ctz*/.CTZ
if we just have vector .CLZ or .POPCOUNT support.
It uses various expansions from Hacker's Delight book as well as GCC's
expansion, in particular:
.CTZ (X) = PREC - .CLZ ((X - 1) & ~X)
.CTZ (X) = .POPCOUNT ((X - 1) & ~X)
.CTZ (X) = (PREC - 1) - .CLZ (X & -X)
.FFS (X) = PREC - .CLZ (X & -X)
.CTZ (X) = PREC - .POPCOUNT (X | -X)
.FFS (X) = (PREC + 1) - .POPCOUNT (X | -X)
.FFS (X) = .CTZ (X) + 1
where the first one can be only used if both CTZ and CLZ have value
defined at zero (kind 2) and both have value of PREC there.
If the original has value defined at zero and the latter doesn't
for other forms or if it doesn't have matching value for that case,
a COND_EXPR is added for that afterwards.

The patch also modifies vect_recog_popcount_clz_ctz_ffs_pattern
such that the two can work together.

2023-04-20  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): New function.
(vect_recog_popcount_clz_ctz_ffs_pattern): Move vect_pattern_detected
call later.  Don't punt for IFN_CTZ or IFN_FFS if it doesn't have
direct optab support, but has instead IFN_CLZ, IFN_POPCOUNT or
for IFN_FFS IFN_CTZ support, use vect_recog_ctz_ffs_pattern for that
case.
(vect_vect_recog_func_ptrs): Add ctz_ffs entry.

* gcc.dg/vect/pr109011-1.c: Remove -mpower9-vector from
dg-additional-options.
(baz, qux): Remove functions and corresponding dg-final.
* gcc.dg/vect/pr109011-2.c: New test.
* gcc.dg/vect/pr109011-3.c: New test.
* gcc.dg/vect/pr109011-4.c: New test.
* gcc.dg/vect/pr109011-5.c: New test.

2 years agoRemove duplicate DFS walks from DF init
Richard Biener [Mon, 20 Feb 2023 14:02:43 +0000 (15:02 +0100)] 
Remove duplicate DFS walks from DF init

The following removes unused CFG order computes from
rest_of_handle_df_initialize.  The CFG orders are computed from df_analyze ().
This also removes code duplication that would have to be kept in sync.

* df-core.cc (rest_of_handle_df_initialize): Remove
computation of df->postorder, df->postorder_inverted and
df->n_blocks.

2 years agotestsuite: Fix up g++.dg/ext/int128-8.C testcase [PR109560]
Jakub Jelinek [Thu, 20 Apr 2023 07:43:04 +0000 (09:43 +0200)] 
testsuite: Fix up g++.dg/ext/int128-8.C testcase [PR109560]

The testcase needs to be restricted to int128 effective targets,
it expectedly fails on i386 and other 32-bit targets.

2023-04-20  Jakub Jelinek  <jakub@redhat.com>

PR c++/108099
PR testsuite/109560
* g++.dg/ext/int128-8.C: Require int128 effective target.

2 years agoPR testsuite/106879 FAIL: gcc.dg/vect/bb-slp-layout-19.c on powerpc64
Jiufu Guo [Tue, 18 Apr 2023 07:56:53 +0000 (15:56 +0800)] 
PR testsuite/106879 FAIL: gcc.dg/vect/bb-slp-layout-19.c on powerpc64

On P7, option -mno-allow-movmisalign is added during testing, which
prevents slp happen on the case.

Like PR65484 and PR87306, this patch use vect_hw_misalign to guard
the case on powerpc targets.

gcc/testsuite/ChangeLog:

PR testsuite/106879
* gcc.dg/vect/bb-slp-layout-19.c: Modify to guard the check with
vect_hw_misalign on POWERs.

2 years agoi386: Share AES xmm intrin with VAES
Haochen Jiang [Fri, 10 Mar 2023 05:40:09 +0000 (13:40 +0800)] 
i386: Share AES xmm intrin with VAES

Currently in GCC, the 128 bit intrin for instruction vaes{end,dec}{last,}
is under AES ISA. Because there is no dependency between ISA set AES
and VAES, The 128 bit intrin is not available when we use compiler flag
-mvaes -mavx512vl and there is no other way to use that intrin. But it
should according to Intel SDM.

Although VAES aims to be a VEX/EVEX promotion for AES, but it is only part
of it. Therefore, we share the AES xmm intrin with VAES.

Also, since -mvaes indicates that we could use VEX encoding for ymm, we
should imply AVX for VAES.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX_UNSET): Add OPTION_MASK_ISA2_VAES_UNSET.
(ix86_handle_option): Set AVX flag for VAES.
* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
Add OPTION_MASK_ISA2_VAES_UNSET.
(def_builtin): Share builtin between AES and VAES.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
* config/i386/i386.md (aes): New isa attribute.
* config/i386/sse.md (aesenc): Add pattern for VAES with xmm.
(aesenclast): Ditto.
(aesdec): Ditto.
(aesdeclast): Ditto.
* config/i386/vaesintrin.h: Remove redundant avx target push.
* config/i386/wmmintrin.h (_mm_aesdec_si128): Change to macro.
(_mm_aesdeclast_si128): Ditto.
(_mm_aesenc_si128): Ditto.
(_mm_aesenclast_si128): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fvl-vaes-1.c: Add VAES xmm test.
* gcc.target/i386/pr109117-1.c: Modify error message.

2 years agoAdd reduce_*_ep[i|u][8|16] series intrinsics
Hu, Lin1 [Thu, 16 Feb 2023 01:10:16 +0000 (09:10 +0800)] 
Add reduce_*_ep[i|u][8|16] series intrinsics

gcc/ChangeLog:

* config/i386/avx2intrin.h
(_MM_REDUCE_OPERATOR_BASIC_EPI16): New macro.
(_MM_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto.
(_MM256_REDUCE_OPERATOR_BASIC_EPI16): Ditto.
(_MM256_REDUCE_OPERATOR_MAX_MIN_EP16): Ditto.
(_MM_REDUCE_OPERATOR_BASIC_EPI8): Ditto.
(_MM_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto.
(_MM256_REDUCE_OPERATOR_BASIC_EPI8): Ditto.
(_MM256_REDUCE_OPERATOR_MAX_MIN_EP8): Ditto.
(_mm_reduce_add_epi16): New instrinsics.
(_mm_reduce_mul_epi16): Ditto.
(_mm_reduce_and_epi16): Ditto.
(_mm_reduce_or_epi16): Ditto.
(_mm_reduce_max_epi16): Ditto.
(_mm_reduce_max_epu16): Ditto.
(_mm_reduce_min_epi16): Ditto.
(_mm_reduce_min_epu16): Ditto.
(_mm256_reduce_add_epi16): Ditto.
(_mm256_reduce_mul_epi16): Ditto.
(_mm256_reduce_and_epi16): Ditto.
(_mm256_reduce_or_epi16): Ditto.
(_mm256_reduce_max_epi16): Ditto.
(_mm256_reduce_max_epu16): Ditto.
(_mm256_reduce_min_epi16): Ditto.
(_mm256_reduce_min_epu16): Ditto.
(_mm_reduce_add_epi8): Ditto.
(_mm_reduce_mul_epi8): Ditto.
(_mm_reduce_and_epi8): Ditto.
(_mm_reduce_or_epi8): Ditto.
(_mm_reduce_max_epi8): Ditto.
(_mm_reduce_max_epu8): Ditto.
(_mm_reduce_min_epi8): Ditto.
(_mm_reduce_min_epu8): Ditto.
(_mm256_reduce_add_epi8): Ditto.
(_mm256_reduce_mul_epi8): Ditto.
(_mm256_reduce_and_epi8): Ditto.
(_mm256_reduce_or_epi8): Ditto.
(_mm256_reduce_max_epi8): Ditto.
(_mm256_reduce_max_epu8): Ditto.
(_mm256_reduce_min_epi8): Ditto.
(_mm256_reduce_min_epu8): Ditto.
* config/i386/avx512vlbwintrin.h:
(_mm_mask_reduce_add_epi16): Ditto.
(_mm_mask_reduce_mul_epi16): Ditto.
(_mm_mask_reduce_and_epi16): Ditto.
(_mm_mask_reduce_or_epi16): Ditto.
(_mm_mask_reduce_max_epi16): Ditto.
(_mm_mask_reduce_max_epu16): Ditto.
(_mm_mask_reduce_min_epi16): Ditto.
(_mm_mask_reduce_min_epu16): Ditto.
(_mm256_mask_reduce_add_epi16): Ditto.
(_mm256_mask_reduce_mul_epi16): Ditto.
(_mm256_mask_reduce_and_epi16): Ditto.
(_mm256_mask_reduce_or_epi16): Ditto.
(_mm256_mask_reduce_max_epi16): Ditto.
(_mm256_mask_reduce_max_epu16): Ditto.
(_mm256_mask_reduce_min_epi16): Ditto.
(_mm256_mask_reduce_min_epu16): Ditto.
(_mm_mask_reduce_add_epi8): Ditto.
(_mm_mask_reduce_mul_epi8): Ditto.
(_mm_mask_reduce_and_epi8): Ditto.
(_mm_mask_reduce_or_epi8): Ditto.
(_mm_mask_reduce_max_epi8): Ditto.
(_mm_mask_reduce_max_epu8): Ditto.
(_mm_mask_reduce_min_epi8): Ditto.
(_mm_mask_reduce_min_epu8): Ditto.
(_mm256_mask_reduce_add_epi8): Ditto.
(_mm256_mask_reduce_mul_epi8): Ditto.
(_mm256_mask_reduce_and_epi8): Ditto.
(_mm256_mask_reduce_or_epi8): Ditto.
(_mm256_mask_reduce_max_epi8): Ditto.
(_mm256_mask_reduce_max_epu8): Ditto.
(_mm256_mask_reduce_min_epi8): Ditto.
(_mm256_mask_reduce_min_epu8): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vlbw-reduce-op-1.c: New test.

2 years agoi386: Add PCLMUL dependency for VPCLMULQDQ
Haochen Jiang [Fri, 10 Mar 2023 02:38:50 +0000 (10:38 +0800)] 
i386: Add PCLMUL dependency for VPCLMULQDQ

Currently in GCC, the 128 bit intrin for instruction vpclmulqdq is
under PCLMUL ISA. Because there is no dependency between ISA set PCLMUL
and VPCLMULQDQ, The 128 bit intrin is not available when we just use
compiler flag -mvpclmulqdq. But it should according to Intel SDM.

Since VPCLMULQDQ is a VEX/EVEX promotion for PCLMUL, it is natural to
add dependency between them.

Also, with -mvpclmulqdq, we can use ymm under VEX encoding, so
VPCLMULQDQ should imply AVX.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA_VPCLMULQDQ_SET):
Add OPTION_MASK_ISA_PCLMUL_SET and OPTION_MASK_ISA_AVX_SET.
(OPTION_MASK_ISA_AVX_UNSET):
Add OPTION_MASK_ISA_VPCLMULQDQ_UNSET.
(OPTION_MASK_ISA_PCLMUL_UNSET): Ditto.
* config/i386/i386.md (vpclmulqdqvl): New.
* config/i386/sse.md (pclmulqdq): Add evex encoding.
* config/i386/vpclmulqdqintrin.h: Remove redudant avx target
push.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vpclmulqdq.c: Add compile test for xmm.

2 years agoi386: Fix vpblendm{b,w} intrins and insns
Haochen Jiang [Tue, 10 Jan 2023 01:35:08 +0000 (09:35 +0800)] 
i386: Fix vpblendm{b,w} intrins and insns

For vpblendm{b,w}, they actually do not have constant parameters.
Therefore, there is no need for them been wrapped in __OPTIMIZE__.

Also, we should check TARGET_AVX512VL for 128/256 bit vectors.

gcc/ChangeLog:

* config/i386/avx512vlbwintrin.h
(_mm_mask_blend_epi16): Remove __OPTIMIZE__ wrapper.
(_mm_mask_blend_epi8): Ditto.
(_mm256_mask_blend_epi16): Ditto.
(_mm256_mask_blend_epi8): Ditto.
* config/i386/avx512vlintrin.h
(_mm256_mask_blend_pd): Ditto.
(_mm256_mask_blend_ps): Ditto.
(_mm256_mask_blend_epi64): Ditto.
(_mm256_mask_blend_epi32): Ditto.
(_mm_mask_blend_pd): Ditto.
(_mm_mask_blend_ps): Ditto.
(_mm_mask_blend_epi64): Ditto.
(_mm_mask_blend_epi32): Ditto.
* config/i386/sse.md (VF_AVX512BWHFBF16): Removed.
(VF_AVX512HFBFVL): Move it before the first usage.
(<avx512>_blendm<mode>): Change iterator from VF_AVX512BWHFBF16
to VF_AVX512HFBFVL.

2 years agoi386: Add AVX512BW dependency to AVX512VBMI2
Haochen Jiang [Mon, 9 Jan 2023 08:41:17 +0000 (16:41 +0800)] 
i386: Add AVX512BW dependency to AVX512VBMI2

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA_AVX512VBMI2_SET): Change OPTION_MASK_ISA_AVX512F_SET
to OPTION_MASK_ISA_AVX512BW_SET.
(OPTION_MASK_ISA_AVX512F_UNSET):
Remove OPTION_MASK_ISA_AVX512VBMI2_UNSET.
(OPTION_MASK_ISA_AVX512BW_UNSET):
Add OPTION_MASK_ISA_AVX512VBMI2_UNSET.
* config/i386/avx512vbmi2intrin.h: Do not push avx512bw.
* config/i386/avx512vbmi2vlintrin.h: Ditto.
* config/i386/i386-builtin.def: Remove OPTION_MASK_ISA_AVX512BW.
* config/i386/sse.md (VI12_AVX512VLBW): Removed.
(VI12_VI48F_AVX512VLBW): Rename to VI12_VI48F_AVX512VL.
(compress<mode>_mask): Change iterator from VI12_AVX512VLBW to
VI12_AVX512VL.
(compressstore<mode>_mask): Ditto.
(expand<mode>_mask): Ditto.
(expand<mode>_maskz): Ditto.
(*expand<mode>_mask): Change iterator from VI12_VI48F_AVX512VLBW to
VI12_VI48F_AVX512VL.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-pr100267-1.c: Remove avx512f and avx512bw.
* gcc.target/i386/avx512bw-pr100267-b-2.c: Ditto.
* gcc.target/i386/avx512bw-pr100267-d-2.c: Ditto.
* gcc.target/i386/avx512bw-pr100267-q-2.c: Ditto.
* gcc.target/i386/avx512bw-pr100267-w-2.c: Ditto.
* gcc.target/i386/avx512f-vpcompressb-1.c: Ditto.
* gcc.target/i386/avx512f-vpcompressb-2.c: Ditto.
* gcc.target/i386/avx512f-vpcompressw-1.c: Ditto.
* gcc.target/i386/avx512f-vpcompressw-2.c: Ditto.
* gcc.target/i386/avx512f-vpexpandb-1.c: Ditto.
* gcc.target/i386/avx512f-vpexpandb-2.c: Ditto.
* gcc.target/i386/avx512f-vpexpandw-1.c: Ditto.
* gcc.target/i386/avx512f-vpexpandw-2.c: Ditto.
* gcc.target/i386/avx512f-vpshld-1.c: Ditto.
* gcc.target/i386/avx512f-vpshldd-2.c: Ditto.
* gcc.target/i386/avx512f-vpshldq-2.c: Ditto.
* gcc.target/i386/avx512f-vpshldv-1.c: Ditto.
* gcc.target/i386/avx512f-vpshldvd-2.c: Ditto.
* gcc.target/i386/avx512f-vpshldvq-2.c: Ditto.
* gcc.target/i386/avx512f-vpshldvw-2.c: Ditto.
* gcc.target/i386/avx512f-vpshrdd-2.c: Ditto.
* gcc.target/i386/avx512f-vpshrdq-2.c: Ditto.
* gcc.target/i386/avx512f-vpshrdv-1.c: Ditto.
* gcc.target/i386/avx512f-vpshrdvd-2.c: Ditto.
* gcc.target/i386/avx512f-vpshrdvq-2.c: Ditto.
* gcc.target/i386/avx512f-vpshrdvw-2.c: Ditto.
* gcc.target/i386/avx512f-vpshrdw-2.c: Ditto.
* gcc.target/i386/avx512vbmi2-vpshld-1.c: Ditto.
* gcc.target/i386/avx512vbmi2-vpshrd-1.c: Ditto.
* gcc.target/i386/avx512vl-vpcompressb-1.c: Ditto.
* gcc.target/i386/avx512vl-vpcompressb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpcompressw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpexpandb-1.c: Ditto.
* gcc.target/i386/avx512vl-vpexpandb-2.c: Ditto.
* gcc.target/i386/avx512vl-vpexpandw-1.c: Ditto.
* gcc.target/i386/avx512vl-vpexpandw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshldd-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshldq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshldv-1.c: Ditto.
* gcc.target/i386/avx512vl-vpshldvd-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshldvq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshldvw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdd-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdv-1.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdvd-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdvq-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdvw-2.c: Ditto.
* gcc.target/i386/avx512vl-vpshrdw-2.c: Ditto.
* gcc.target/i386/avx512vlbw-pr100267-1.c: Ditto.
* gcc.target/i386/avx512vlbw-pr100267-b-2.c: Ditto.
* gcc.target/i386/avx512vlbw-pr100267-w-2.c: Ditto.

2 years agoi386: Add AVX512BW dependency to AVX512BITALG
Haochen Jiang [Thu, 5 Jan 2023 06:58:14 +0000 (14:58 +0800)] 
i386: Add AVX512BW dependency to AVX512BITALG

Since some of the AVX512BITALG intrins use 32/64 bit mask,
AVX512BW should be implied.

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA_AVX512BITALG_SET):
Change OPTION_MASK_ISA_AVX512F_SET
to OPTION_MASK_ISA_AVX512BW_SET.
(OPTION_MASK_ISA_AVX512F_UNSET):
Remove OPTION_MASK_ISA_AVX512BITALG_SET.
(OPTION_MASK_ISA_AVX512BW_UNSET):
Add OPTION_MASK_ISA_AVX512BITALG_SET.
* config/i386/avx512bitalgintrin.h: Do not push avx512bw.
* config/i386/i386-builtin.def:
Remove redundant OPTION_MASK_ISA_AVX512BW.
* config/i386/sse.md (VI1_AVX512VLBW): Removed.
(avx512vl_vpshufbitqmb<mode><mask_scalar_merge_name>):
Change the iterator from VI1_AVX512VLBW to VI1_AVX512VL.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bitalg-vpopcntb-1.c:
Remove avx512bw.
* gcc.target/i386/avx512bitalg-vpopcntb.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntw-1.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
* gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Ditto.
* gcc.target/i386/avx512bitalg-vpshufbitqmb.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpopcntb-1.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpopcntw-1.c: Ditto.
* gcc.target/i386/avx512bitalgvl-vpshufbitqmb-1.c: Ditto.
* gcc.target/i386/pr93696-1.c: Ditto.
* gcc.target/i386/pr93696-2.c: Ditto.

2 years agoi386: Use macro to wrap up share builtin exceptions in builtin isa check
Haochen Jiang [Thu, 15 Dec 2022 03:10:16 +0000 (11:10 +0800)] 
i386: Use macro to wrap up share builtin exceptions in builtin isa check

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_check_builtin_isa_match): Correct wrong comments.
Add a new macro SHARE_BUILTIN and refactor the current if
clauses to macro.

2 years agoRe-arrange sections of i386 cpuid
Mo, Zewei [Tue, 10 Jan 2023 08:11:02 +0000 (16:11 +0800)] 
Re-arrange sections of i386 cpuid

gcc/ChangeLog:

* config/i386/cpuid.h: Open a new section for Extended Features
Leaf (%eax == 7, %ecx == 0) and Extended Features Sub-leaf (%eax == 7,
%ecx == 1).

2 years agoOptimize vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm
Hu, Lin1 [Mon, 16 Jan 2023 03:23:09 +0000 (11:23 +0800)] 
Optimize vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm

vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm are 3 clk.
We can optimze them to vblend, vmovaps when there's no cross-lane.

gcc/ChangeLog:

* config/i386/sse.md: Modify insn vperm{i,f}
and vshuf{i,f}.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-vshuff32x4-1.c: Modify test.
* gcc.target/i386/avx512vl-vshuff64x2-1.c: Ditto.
* gcc.target/i386/avx512vl-vshufi32x4-1.c: Ditto.
* gcc.target/i386/avx512vl-vshufi64x2-1.c: Ditto.
* gcc.target/i386/opt-vperm-vshuf-1.c: New test.
* gcc.target/i386/opt-vperm-vshuf-2.c: Ditto.
* gcc.target/i386/opt-vperm-vshuf-3.c: Ditto.

2 years agoDaily bump.
GCC Administrator [Thu, 20 Apr 2023 00:17:12 +0000 (00:17 +0000)] 
Daily bump.

2 years agogcc: xtensa: add -m[no-]strict-align option
Max Filippov [Tue, 28 Feb 2023 13:46:29 +0000 (05:46 -0800)] 
gcc: xtensa: add -m[no-]strict-align option

gcc/
* config/xtensa/xtensa-opts.h: New header.
* config/xtensa/xtensa.h (STRICT_ALIGNMENT): Redefine as
xtensa_strict_align.
* config/xtensa/xtensa.cc (xtensa_option_override): When
-m[no-]strict-align is not specified in the command line set
xtensa_strict_align to 0 if the hardware supports both unaligned
loads and stores or to 1 otherwise.
* config/xtensa/xtensa.opt (mstrict-align): New option.
* doc/invoke.texi (Xtensa Options): Document -m[no-]strict-align.

2 years agogcc: xtensa: add data alignment properties to dynconfig
Max Filippov [Tue, 28 Feb 2023 13:38:12 +0000 (05:38 -0800)] 
gcc: xtensa: add data alignment properties to dynconfig

gcc/
* config/xtensa/xtensa-dynconfig.cc (xtensa_get_config_v4): New
function.

include/
* xtensa-dynconfig.h (xtensa_config_v4): New struct.
(XCHAL_DATA_WIDTH, XCHAL_UNALIGNED_LOAD_EXCEPTION)
(XCHAL_UNALIGNED_STORE_EXCEPTION, XCHAL_UNALIGNED_LOAD_HW)
(XCHAL_UNALIGNED_STORE_HW, XTENSA_CONFIG_V4_ENTRY_LIST): New
definitions.
(XTENSA_CONFIG_INSTANCE_LIST): Add xtensa_config_v4 instance.
(XTENSA_CONFIG_ENTRY_LIST): Add XTENSA_CONFIG_V4_ENTRY_LIST.

2 years agoc++: Define built-in for std::tuple_element [PR100157]
Patrick Palka [Wed, 19 Apr 2023 19:36:34 +0000 (15:36 -0400)] 
c++: Define built-in for std::tuple_element [PR100157]

This adds a new built-in to replace the recursive class template
instantiations done by traits such as std::tuple_element and
std::variant_alternative.  The purpose is to select the Nth type from a
list of types, e.g. __type_pack_element<1, char, int, float> is int.
We implement it as a special kind of TRAIT_TYPE.

For a pathological example tuple_element_t<1000, tuple<2000 types...>>
the compilation time is reduced by more than 90% and the memory used by
the compiler is reduced by 97%.  In realistic examples the gains will be
much smaller, but still relevant.

Unlike the other built-in traits, __type_pack_element uses template-id
syntax instead of call syntax and is SFINAE-enabled, matching Clang's
implementation.  And like the other built-in traits, it's not mangleable
so we can't use it directly in function signatures.

N.B. Clang seems to implement __type_pack_element as a first-class
template that can e.g. be used as a template-template argument.  For
simplicity we implement it in a more ad-hoc way.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
PR c++/100157

gcc/cp/ChangeLog:

* cp-trait.def (TYPE_PACK_ELEMENT): Define.
* cp-tree.h (finish_trait_type): Add complain parameter.
* cxx-pretty-print.cc (pp_cxx_trait): Handle
CPTK_TYPE_PACK_ELEMENT.
* parser.cc (cp_parser_constant_expression): Document default
arguments.
(cp_parser_trait): Handle CPTK_TYPE_PACK_ELEMENT.  Pass
tf_warning_or_error to finish_trait_type.
* pt.cc (tsubst) <case TRAIT_TYPE>: Handle non-type first
argument.  Pass complain to finish_trait_type.
* semantics.cc (finish_type_pack_element): Define.
(finish_trait_type): Add complain parameter.  Handle
CPTK_TYPE_PACK_ELEMENT.
* tree.cc (strip_typedefs): Handle non-type first argument.
Pass tf_warning_or_error to finish_trait_type.
* typeck.cc (structural_comptypes) <case TRAIT_TYPE>: Use
cp_tree_equal instead of same_type_p for the first argument.

libstdc++-v3/ChangeLog:

* include/bits/utility.h (_Nth_type): Conditionally define in
terms of __type_pack_element if available.
* testsuite/20_util/tuple/element_access/get_neg.cc: Prune
additional errors from the new built-in.

gcc/testsuite/ChangeLog:

* g++.dg/ext/type_pack_element1.C: New test.
* g++.dg/ext/type_pack_element2.C: New test.
* g++.dg/ext/type_pack_element3.C: New test.

2 years agoc++: bad ggc_free in try_class_unification [PR109556]
Patrick Palka [Wed, 19 Apr 2023 17:07:46 +0000 (13:07 -0400)] 
c++: bad ggc_free in try_class_unification [PR109556]

Aside from correcting how try_class_unification copies multi-dimensional
'targs', r13-377-g3e948d645bc908 also made it ggc_free this copy as an
optimization.  But this is wrong since the call to unify within might've
captured the args in persistent memory such as the satisfaction cache
(as part of constrained auto deduction).

PR c++/109556

gcc/cp/ChangeLog:

* pt.cc (try_class_unification): Don't ggc_free the copy of
'targs'.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-placeholder13.C: New test.

2 years agotestsuite: fix scan-tree-dump patterns [PR83904,PR100297]
Harald Anlauf [Tue, 18 Apr 2023 19:24:20 +0000 (21:24 +0200)] 
testsuite: fix scan-tree-dump patterns [PR83904,PR100297]

Adjust scan-tree-dump patterns so that they do not accidentally match a
valid path.

gcc/testsuite/ChangeLog:

PR testsuite/83904
PR fortran/100297
* gfortran.dg/allocatable_function_1.f90: Use "__builtin_free "
instead of the naive "free".
* gfortran.dg/reshape_8.f90: Extend pattern from a simple "data".

2 years agoi386: Add new pattern for zero-extend cmov
Andrew Pinski [Thu, 13 Apr 2023 00:40:40 +0000 (00:40 +0000)] 
i386: Add new pattern for zero-extend cmov

After a phiopt change, I got a failure of cmov9.c.
The RTL IR has zero_extend on the outside of
the if_then_else rather than on the side. Both
ways are considered canonical as mentioned in
PR 66588.

This fixes the failure I got and also adds a testcase
which fails before even my phiopt patch but will pass
with this patch.

OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.

gcc/ChangeLog:

* config/i386/i386.md (*movsicc_noc_zext_1): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/cmov10.c: New test.
* gcc.target/i386/cmov11.c: New test.

2 years agoc++: fix 'unsigned __int128_t' semantics [PR108099]
Jason Merrill [Tue, 18 Apr 2023 21:12:17 +0000 (17:12 -0400)] 
c++: fix 'unsigned __int128_t' semantics [PR108099]

My earlier patch for 108099 made us accept this non-standard pattern but
messed up the semantics, so that e.g. unsigned __int128_t was not a 128-bit
type.

PR c++/108099

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Keep typedef_decl for __int128_t.

gcc/testsuite/ChangeLog:

* g++.dg/ext/int128-8.C: New test.

2 years agoRISC-V: Support 128 bit vector chunk
Juzhe-Zhong [Wed, 19 Apr 2023 12:33:46 +0000 (20:33 +0800)] 
RISC-V: Support 128 bit vector chunk

RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,

Current status (without this patch):

Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
 - one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI

Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
 - one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI.
 - one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI.

However `v` extension guarantees the minimal VLEN is 128 bits.

We introduce another type/mode mapping for this configure:

v: Mode for one vector register mode is VNx2DImode or VNx4SImode
 - one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
 - one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI

This patch model the mode more precisely for the RVV, and help some
middle-end optimization that assume number of element must be a
multiple of two.

gcc/ChangeLog:

* config/riscv/riscv-modes.def (FLOAT_MODE): Add chunk 128 support.
(VECTOR_BOOL_MODE): Ditto.
(ADJUST_NUNITS): Ditto.
(ADJUST_ALIGNMENT): Ditto.
(ADJUST_BYTESIZE): Ditto.
(ADJUST_PRECISION): Ditto.
(RVV_MODES): Ditto.
(VECTOR_MODE_WITH_PREFIX): Ditto.
* config/riscv/riscv-v.cc (ENTRY): Ditto.
(get_vlmul): Ditto.
(get_ratio): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE): Ditto.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE): Ditto.
(vbool64_t): Ditto.
(vbool32_t): Ditto.
(vbool16_t): Ditto.
(vbool8_t): Ditto.
(vbool4_t): Ditto.
(vbool2_t): Ditto.
(vbool1_t): Ditto.
(vint8mf8_t): Ditto.
(vuint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
* config/riscv/riscv-vector-switch.def (ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Ditto.
(riscv_convert_vector_bits): Ditto.
* config/riscv/riscv.md:
* config/riscv/vector-iterators.md:
* config/riscv/vector.md
(@pred_indexed_<order>store<VNX32_QH:mode><VNX32_QHI:mode>): Ditto.
(@pred_indexed_<order>store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
(@pred_indexed_<order>store<VNX64_Q:mode><VNX64_Q:mode>): Ditto.
(@pred_indexed_<order>store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
(@pred_indexed_<order>store<VNX128_Q:mode><VNX128_Q:mode>): Ditto.
(@pred_reduc_<reduc><mode><vlmul1_zve64>): Ditto.
(@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve64>): Ditto.
(@pred_reduc_plus<order><mode><vlmul1_zve64>): Ditto.
(@pred_widen_reduc_plus<order><mode><vwlmul1_zve64>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr108185-4.c: Adapt testcase.
* gcc.target/riscv/rvv/base/spill-1.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-2.c: Ditto.
* gcc.target/riscv/rvv/base/spill-3.c: Ditto.
* gcc.target/riscv/rvv/base/spill-5.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.

2 years agoRISC-V: Align IOR optimization MODE_CLASS condition to AND.
Pan Li [Wed, 19 Apr 2023 09:18:20 +0000 (17:18 +0800)] 
RISC-V: Align IOR optimization MODE_CLASS condition to AND.

This patch aligned the MODE_CLASS condition of the IOR to the AND. Then
more MODE_CLASS besides SCALAR_INT can able to perform the optimization
A | (~A) -> -1 similar to AND operator. For example as below sample code.

vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
{
  return __riscv_vmorn_mm_b32(v1, v1, vl);
}

Before this patch:
vsetvli  a5,zero,e8,mf4,ta,ma
vlm.v    v24,0(a1)
vsetvli  zero,a2,e8,mf4,ta,ma
vmorn.mm v24,v24,v24
vsetvli  a5,zero,e8,mf4,ta,ma
vsm.v    v24,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf4,ta,ma
vmset.m v24
vsetvli a5,zero,e8,mf4,ta,ma
vsm.v   v24,0(a0)
ret

Or in RTL's perspective,
from:
(ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
to:
(const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])

The similar optimization like VMANDN has enabled already. There should
be no difference execpt the operator when compare the VMORN and VMANDN
for such kind of optimization. The patch aligns the IOR MODE_CLASS condition
of the simplification to the AND operator.

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Align IOR (A | (~A) -> -1) optimization MODE_CLASS condition to AND.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Update check
condition.
* gcc.target/riscv/simplify_ior_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2 years agoi386: Emit compares between high registers and memory
Uros Bizjak [Wed, 19 Apr 2023 15:00:52 +0000 (17:00 +0200)] 
i386: Emit compares between high registers and memory

Following code:

typedef __SIZE_TYPE__ size_t;

struct S1s
{
  char pad1;
  char val;
  short pad2;
};

extern char ts[256];

_Bool foo (struct S1s a, size_t i)
{
  return (ts[i] > a.val);
}

compiles with -O2 to:

        movl    %edi, %eax
        movsbl  %ah, %edi
        cmpb    %dil, ts(%rsi)
        setg    %al
        ret

the compare could use high register %ah instead of %dil:

        movl    %edi, %eax
        cmpb    ts(%rsi), %ah
        setl    %al
        ret

Use any_extract code iterator to handle signed and unsigned extracts
from high register and introduce peephole2 patterns to propagate
norex memory opeerand into the compare insn.

gcc/ChangeLog:

PR target/78904
PR target/78952
* config/i386/i386.md (*cmpqi_ext<mode>_1_mem_rex64): New insn pattern.
(*cmpqi_ext<mode>_1): Use nonimmediate_operand predicate
for operand 0. Use any_extract code iterator.
(*cmpqi_ext<mode>_1 peephole2): New peephole2 pattern.
(*cmpqi_ext<mode>_2): Use any_extract code iterator.
(*cmpqi_ext<mode>_3_mem_rex64): New insn pattern.
(*cmpqi_ext<mode>_1): Use general_operand predicate
for operand 1. Use any_extract code iterator.
(*cmpqi_ext<mode>_3 peephole2): New peephole2 pattern.
(*cmpqi_ext<mode>_4): Use any_extract code iterator.

gcc/testsuite/ChangeLog:

PR target/78904
PR target/78952
* gcc.target/i386/pr78952-3.c: New test.

2 years agoaarch64: Factorise widening add/sub high-half expanders with iterators
Kyrylo Tkachov [Wed, 19 Apr 2023 14:43:49 +0000 (15:43 +0100)] 
aarch64: Factorise widening add/sub high-half expanders with iterators

I noticed these define_expand are almost identical modulo some string substitutions.
This patch compresses them together with a couple of code iterators.
No functional change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_saddw2<mode>): Delete.
(aarch64_uaddw2<mode>): Delete.
(aarch64_ssubw2<mode>): Delete.
(aarch64_usubw2<mode>): Delete.
(aarch64_<ANY_EXTEND:su><ADDSUB:optab>w2<mode>): New define_expand.

2 years agoUse solve_add_graph_edge in more places
Richard Biener [Tue, 14 Mar 2023 13:39:17 +0000 (14:39 +0100)] 
Use solve_add_graph_edge in more places

The following makes sure to use solve_add_graph_edge and honoring
special-cases, especially edges from escaped, in the remaining places
the solver adds edges.

* tree-ssa-structalias.cc (do_ds_constraint): Use
solve_add_graph_edge.

2 years agoSplit out solve_add_graph_edge
Richard Biener [Tue, 14 Mar 2023 13:38:01 +0000 (14:38 +0100)] 
Split out solve_add_graph_edge

Split out a worker with all the special-casings when adding a graph
edge during solving.

* tree-ssa-structalias.cc (solve_add_graph_edge): New function,
split out from ...
(do_sd_constraint): ... here.

2 years agoRemove odd code from gimple_can_merge_blocks_p
Richard Biener [Wed, 22 Mar 2023 13:13:02 +0000 (14:13 +0100)] 
Remove odd code from gimple_can_merge_blocks_p

The following removes a special case to not merge a block with
only a non-local label.  We have a restriction of non-local labels
to be the first statement (and label) in a block, but otherwise nothing,
if the last stmt of A is a non-local label then it will be still
the first statement of the combined A + B.  In particular we'd
happily merge when there's a stmt after that label.

The check originates from the tree-ssa merge.

Bootstrapped and tested on x86_64-unknown-linux-gnu with all
languages.

* tree-cfg.cc (gimple_can_merge_blocks_p): Remove condition
rejecting the merge when A contains only a non-local label.

2 years agoIntroduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates
Uros Bizjak [Wed, 19 Apr 2023 13:38:31 +0000 (15:38 +0200)] 
Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates

These two predicates are similar to existing HARD_REGISTER_P and
HARD_REGISTER_NUM_P predicates and return 1 if the given register
corresponds to a virtual register.

gcc/ChangeLog:

* rtl.h (VIRTUAL_REGISTER_P): New predicate.
(VIRTUAL_REGISTER_NUM_P): Ditto.
(REGNO_PTR_FRAME_P): Use VIRTUAL_REGISTER_NUM_P predicate.
* expr.cc (force_operand): Use VIRTUAL_REGISTER_P predicate.
* function.cc (instantiate_decl_rtl): Ditto.
* rtlanal.cc (rtx_addr_can_trap_p_1): Ditto.
(nonzero_address_p): Ditto.
(refers_to_regno_p): Use VIRTUAL_REGISTER_NUM_P predicate.

2 years agoFix pointer sharing in Value_Range constructor.
Aldy Hernandez [Mon, 6 Mar 2023 12:53:15 +0000 (13:53 +0100)] 
Fix pointer sharing in Value_Range constructor.

gcc/ChangeLog:

* value-range.h (Value_Range::Value_Range): Avoid pointer sharing.

2 years agoTransform more gmp/mpfr uses to use RAII
Richard Biener [Wed, 19 Apr 2023 07:45:55 +0000 (09:45 +0200)] 
Transform more gmp/mpfr uses to use RAII

The following picks up the coccinelle generated patch from Bernhard,
leaving out the fortran frontend parts and fixing up the rest.
In particular both gmp.h and mpfr.h contain macros like
  #define mpfr_inf_p(_x)      ((_x)->_mpfr_exp == __MPFR_EXP_INF)
for which I add operator-> overloads to the auto_* classes.

* system.h (auto_mpz::operator->()): New.
* realmpfr.h (auto_mpfr::operator->()): New.
* builtins.cc (do_mpfr_lgamma_r): Use auto_mpfr.
* real.cc (real_from_string): Likewise.
(dconst_e_ptr): Likewise.
(dconst_sqrt2_ptr): Likewise.
* tree-ssa-loop-niter.cc (refine_value_range_using_guard):
Use auto_mpz.
(bound_difference_of_offsetted_base): Likewise.
(number_of_iterations_ne): Likewise.
(number_of_iterations_lt_to_ne): Likewise.
* ubsan.cc: Include realmpfr.h.
(ubsan_instrument_float_cast): Use auto_mpfr.

2 years agoRevert "libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]"
Jonathan Wakely [Wed, 19 Apr 2023 12:12:27 +0000 (13:12 +0100)] 
Revert "libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]"

This reverts commit b7c54e3f48086c29179f7765a35c381de5109a0a.

libstdc++-v3/ChangeLog:

* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt:
* config/abi/post/i486-linux-gnu/baseline_symbols.txt:
* config/abi/post/m68k-linux-gnu/baseline_symbols.txt:
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt:
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt:
* config/abi/pre/gnu.ver:
* src/Makefile.am:
* src/Makefile.in:
* src/c++98/Makefile.am:
* src/c++98/Makefile.in:
* src/c++98/globals_io.cc (defined):
(_GLIBCXX_IO_GLOBAL):

2 years agoRevert "libstdc++: Fix preprocessor condition in linker script [PR108969]"
Jonathan Wakely [Wed, 19 Apr 2023 12:12:23 +0000 (13:12 +0100)] 
Revert "libstdc++: Fix preprocessor condition in linker script [PR108969]"

This reverts commit 6067ae4557a3a7e5b08359e78a29b8a9d5dfedce.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver:

2 years agoRemove special-cased edges when solving copies
Richard Biener [Wed, 15 Mar 2023 07:46:04 +0000 (08:46 +0100)] 
Remove special-cased edges when solving copies

The following makes sure to remove the copy edges we ignore or
need to special-case only once.

* tree-ssa-structalias.cc (solve_graph): Remove self-copy
edges, remove edges from escaped after special-casing them.

2 years agoFix do_sd_constraint escape special casing
Richard Biener [Wed, 15 Mar 2023 07:43:06 +0000 (08:43 +0100)] 
Fix do_sd_constraint escape special casing

The following fixes the escape special casing to test the proper
variable IDs.

* tree-ssa-structalias.cc (do_sd_constraint): Fixup escape
special casing.

2 years agoRemove senseless store in do_sd_constraint
Richard Biener [Tue, 14 Mar 2023 13:42:10 +0000 (14:42 +0100)] 
Remove senseless store in do_sd_constraint

* tree-ssa-structalias.cc (do_sd_constraint): Do not write
to the LHS varinfo solution member.

2 years agoAvoid non-unified nodes on the topological sorting for PTA solving
Richard Biener [Tue, 14 Mar 2023 13:39:32 +0000 (14:39 +0100)] 
Avoid non-unified nodes on the topological sorting for PTA solving

Since we do not update successor edges when merging nodes we have
to deal with this in the users.  The following avoids putting those
on the topo order vector.

* tree-ssa-structalias.cc (topo_visit): Look at the real
destination of edges.

2 years agotree-optimization/44794 - avoid excessive RTL unrolling on epilogues
Richard Biener [Thu, 9 Mar 2023 08:02:07 +0000 (09:02 +0100)] 
tree-optimization/44794 - avoid excessive RTL unrolling on epilogues

The following adjusts tree_[transform_and_]unroll_loop to set an
upper bound on the number of iterations on the epilogue loop it
creates.  For the testcase at hand which involves array prefetching
this avoids applying RTL unrolling to them when -funroll-loops is
specified.

Other users of this API includes predictive commoning and
unroll-and-jam.

PR tree-optimization/44794
* tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop):
If an epilogue loop is required set its iteration upper bound.

2 years agoLoongArch: Improve cpymemsi expansion [PR109465]
Xi Ruoyao [Wed, 12 Apr 2023 11:45:48 +0000 (11:45 +0000)] 
LoongArch: Improve cpymemsi expansion [PR109465]

We'd been generating really bad block move sequences which is recently
complained by kernel developers who tried __builtin_memcpy.  To improve
it:

1. Take the advantage of -mno-strict-align.  When it is set, set mode
   size to UNITS_PER_WORD regardless of the alignment.
2. Half the mode size when (block size) % (mode size) != 0, instead of
   falling back to ld.bu/st.b at once.
3. Limit the length of block move sequence considering the number of
   instructions, not the size of block.  When -mstrict-align is set and
   the block is not aligned, the old size limit for straight-line
   implementation (64 bytes) was definitely too large (we don't have 64
   registers anyway).

Change since v1: add a comment about the calculation of num_reg.

gcc/ChangeLog:

PR target/109465
* config/loongarch/loongarch-protos.h
(loongarch_expand_block_move): Add a parameter as alignment RTX.
* config/loongarch/loongarch.h:
(LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove.
(LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove.
(LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define.
(LARCH_MAX_MOVE_OPS_STRAIGHT): Define.
(MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
* config/loongarch/loongarch.cc (loongarch_expand_block_move):
Take the alignment from the parameter, but set it to
UNITS_PER_WORD if !TARGET_STRICT_ALIGN.  Limit the length of
straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT
instead of LARCH_MAX_MOVE_BYTES_STRAIGHT.
(loongarch_block_move_straight): When there are left-over bytes,
half the mode size instead of falling back to byte mode at once.
(loongarch_block_move_loop): Limit the length of loop body with
LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of
LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER.
* config/loongarch/loongarch.md (cpymemsi): Pass the alignment
to loongarch_expand_block_move.

gcc/testsuite/ChangeLog:

PR target/109465
* gcc.target/loongarch/pr109465-1.c: New test.
* gcc.target/loongarch/pr109465-2.c: New test.
* gcc.target/loongarch/pr109465-3.c: New test.

2 years agoLoongArch: Improve GAR store for va_list
Xi Ruoyao [Tue, 28 Mar 2023 17:36:09 +0000 (01:36 +0800)] 
LoongArch: Improve GAR store for va_list

LoongArch backend used to save all GARs for a function with variable
arguments.  But sometimes a function only accepts variable arguments for
a purpose like C++ function overloading.  For example, POSIX defines
open() as:

    int open(const char *path, int oflag, ...);

But only two forms are actually used:

    int open(const char *pathname, int flags);
    int open(const char *pathname, int flags, mode_t mode);

So it's obviously a waste to save all 8 GARs in open().  We can use the
cfun->va_list_gpr_size field set by the stdarg pass to only save the
GARs necessary to be saved.

If the va_list escapes (for example, in fprintf() we pass it to
vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we
don't need a special case.

With this patch, only one GAR ($a2/$r6) is saved in open().  Ideally
even this stack store should be omitted too, but doing so is not trivial
and AFAIK there are no compilers (for any target) performing the "ideal"
optimization here, see https://godbolt.org/z/n1YqWq9c9.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
(GCC 14 or now)?

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_setup_incoming_varargs): Don't save more GARs than
cfun->va_list_gpr_size / UNITS_PER_WORD.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/va_arg.c: New test.

2 years agoAvoid unnecessary epilogues from tree_unroll_loop
Richard Biener [Thu, 9 Mar 2023 09:56:57 +0000 (10:56 +0100)] 
Avoid unnecessary epilogues from tree_unroll_loop

The following fixes the condition determining whether we need an
epilogue.

* tree-ssa-loop-manip.cc (determine_exit_conditions): Fix
no epilogue condition.

2 years agoSimplify gimple_assign_load
Richard Biener [Fri, 16 Dec 2022 12:48:58 +0000 (13:48 +0100)] 
Simplify gimple_assign_load

The following simplifies and outlines gimple_assign_load.  In
particular it is not necessary to get at the base of the possibly
loaded expression but just handle the case of a single handled
component wrapping a non-memory operand.

* gimple.h (gimple_assign_load): Outline...
* gimple.cc (gimple_assign_load): ... here.  Avoid
get_base_address and instead just strip the outermost
handled component, treating a remaining handled component
as load.

2 years agoaarch64: Delete __builtin_aarch64_neg* builtins and their use
Kyrylo Tkachov [Wed, 19 Apr 2023 09:32:07 +0000 (10:32 +0100)] 
aarch64: Delete __builtin_aarch64_neg* builtins and their use

I don't think we need to keep the __builtin_aarch64_neg* builtins around.
They are only used once in the vnegh_f16 intrinsic in arm_fp16.h and I AFAICT
it was added this way only for the sake of orthogonality in
https://gcc.gnu.org/g:d7f33f07d88984cbe769047e3d07fc21067fbba9
We already use normal "-" negation in the other vneg* intrinsics, so do so here as well.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (neg): Delete builtins
definition.
* config/aarch64/arm_fp16.h (vnegh_f16): Reimplement using normal negation.

2 years agotree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011]
Jakub Jelinek [Wed, 19 Apr 2023 09:14:23 +0000 (11:14 +0200)] 
tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011]

For __builtin_popcountll tree-vect-patterns.cc has
vect_recog_popcount_pattern, which improves the vectorized code.
Without that the vectorization is always multi-type vectorization
in the loop (at least int and long long types) where we emit two
.POPCOUNT calls with long long arguments and int return value and then
widen to long long, so effectively after vectorization do the
V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
and immediately unpack.

The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
builtins as well (as long as there is an optab for them; more to come
laster).

x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).

2023-04-19  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
(vect_recog_popcount_clz_ctz_ffs_pattern): ... this.  Handle also
CLZ, CTZ and FFS.  Remove vargs variable, use
gimple_build_call_internal rather than gimple_build_call_internal_vec.
(vect_vect_recog_func_ptrs): Adjust popcount entry.

* gcc.dg/vect/pr109011-1.c: New test.

2 years agodse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for WORD_REGISTER_OPERAT...
Jakub Jelinek [Wed, 19 Apr 2023 09:13:11 +0000 (11:13 +0200)] 
dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for WORD_REGISTER_OPERATIONS targets [PR109040]

While we've agreed this is not the right fix for the PR109040 bug,
the patch clearly improves generated code (at least on the testcase from the
PR), so I'd like to propose this as optimization heuristics improvement
for GCC 14.

2023-04-19  Jakub Jelinek  <jakub@redhat.com>

PR target/109040
* dse.cc (replace_read): If read_reg is a SUBREG of a word mode
REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into
a new REG rather than the SUBREG.

2 years ago[aarch64] Use wzr/xzr for assigning 0 to vector element.
Prathamesh Kulkarni [Wed, 19 Apr 2023 08:38:40 +0000 (14:08 +0530)] 
[aarch64] Use wzr/xzr for assigning 0 to vector element.

gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>):
New pattern.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vec-set-zero.c: New test.

2 years agoaarch64: PR target/108840 Simplify register shift RTX costs and eliminate shift amoun...
Kyrylo Tkachov [Wed, 19 Apr 2023 08:34:40 +0000 (09:34 +0100)] 
aarch64: PR target/108840 Simplify register shift RTX costs and eliminate shift amount masking

In this PR we fail to eliminate explicit &31 operations for variable shifts such as in:
void
bar (int x[3], int y)
{
  x[0] <<= (y & 31);
  x[1] <<= (y & 31);
  x[2] <<= (y & 31);
}

This is rejected by RTX costs that end up giving too high a cost for:
(set (reg:SI 96)
    (ashift:SI (reg:SI 98)
        (subreg:QI (and:SI (reg:SI 99)
                (const_int 31 [0x1f])) 0)))

There is code to handle the AND-31 case in rtx costs, but it gets confused by the subreg.
It's easy enough to fix by looking inside the subreg when costing the expression.
While doing that I noticed that the ASHIFT case and the other shift-like cases are almost identical
and we should just merge them. This code will only be used for valid insns anyway, so the code after this
patch should do the Right Thing (TM) for all such shift cases.

With this patch there are no more "and wn, wn, 31" instructions left in the testcase.

Bootstrapped and tested on aarch64-none-linux-gnu.

PR target/108840

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_costs): Merge ASHIFT and
ROTATE, ROTATERT, LSHIFTRT, ASHIFTRT cases.  Handle subregs in op1.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr108840.c: New test.

2 years agortl-optimization/109237 - quadraticness in delete_trivially_dead_insns
Richard Biener [Wed, 22 Mar 2023 08:29:49 +0000 (09:29 +0100)] 
rtl-optimization/109237 - quadraticness in delete_trivially_dead_insns

The following addresses quadraticness in processing debug insns
in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED
on the INSN_VAR_LOCATION_DECL to indicate a later debug bind
with the same decl and no intervening real insn or debug marker.
That gets rid of the NEXT_INSN walk in insn_live_p in favor of
first clearing TREE_VISITED in the first loop over insn and
the book-keeping of decls we set the bit since we need to clear
them when visiting a real or debug marker insn.

That improves the time spent in delete_trivially_dead_insns from
10.6s to 2.2s for the testcase.

PR rtl-optimization/109237
* cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check
TREE_VISITED on INSN_VAR_LOCATION_DECL.
(delete_trivially_dead_insns): Maintain TREE_VISITED on
active debug bind INSN_VAR_LOCATION_DECL.

2 years agortl-optimization/109237 - speedup bb_is_just_return
Richard Biener [Wed, 22 Mar 2023 09:05:19 +0000 (10:05 +0100)] 
rtl-optimization/109237 - speedup bb_is_just_return

For the testcase bb_is_just_return is on top of the profile, changing
it to walk BB insns backwards puts it off the profile.  That's because
in the forward walk you have to process possibly many debug insns
but in a backward walk you very likely run into control insns first.

PR rtl-optimization/109237
* cfgcleanup.cc (bb_is_just_return): Walk insns backwards.

2 years agotestsuite: Fix up pr109524.C for -std=c++23 [PR109524]
Jakub Jelinek [Wed, 19 Apr 2023 08:01:04 +0000 (10:01 +0200)] 
testsuite: Fix up pr109524.C for -std=c++23 [PR109524]

This testcase was reduced such that it isn't valid C++23, so with my
usual testing with GXX_TESTSUITE_STDS=98,11,14,17,20,2b it fails:
FAIL: g++.dg/pr109524.C  -std=gnu++2b (test for excess errors)
.../gcc/testsuite/g++.dg/pr109524.C: In function 'nn hh(nn)':
.../gcc/testsuite/g++.dg/pr109524.C:35:12: error: cannot bind non-const lvalue reference of type 'nn&' to an rvalue of type 'nn'
.../gcc/testsuite/g++.dg/pr109524.C:17:6: note:   initializing argument 1 of 'nn::nn(nn&)'
The following patch fixes that and I've verified it doesn't change
anything on what the test was testing, it still ICEs in r13-7198 and
passes in r13-7203, now in all language modes (except for 98 where
it is intentionally UNSUPPORTED).

2023-04-19  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109524
* g++.dg/pr109524.C (nn::nn): Change argument type from nn & to
const nn &.

2 years agoinstall.texi: Document --enable-decimal-float for AArch64
Christophe Lyon [Mon, 17 Apr 2023 17:23:54 +0000 (19:23 +0200)] 
install.texi: Document --enable-decimal-float for AArch64

When I committed the patches to enable support for DFP on AArch64, I
forgot to update the installation documentation.

This patch adds AArch64 as needed (same as i386/x86_64).

2023-04-17  Christophe Lyon  <christophe.lyon@arm.com>

gcc/
* doc/install.texi (enable-decimal-float): Add AArch64.

2 years agoCheck hard_regno_mode_ok before setting lowest memory move cost for the mode with...
liuhongt [Mon, 3 Apr 2023 02:54:55 +0000 (10:54 +0800)] 
Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.

There's a potential performance issue when backend returns some
unreasonable value for the mode which can be never be allocate with
reg class.

gcc/ChangeLog:

PR rtl-optimization/109351
* ira.cc (setup_class_subset_and_memory_move_costs): Check
hard_regno_mode_ok before setting lowest memory move cost for
the mode with different reg classes.

2 years agoDaily bump.
GCC Administrator [Wed, 19 Apr 2023 00:17:36 +0000 (00:17 +0000)] 
Daily bump.

2 years agolibstdc++: Adjust uses of null pointer constants in docs
Jonathan Wakely [Tue, 18 Apr 2023 23:07:36 +0000 (00:07 +0100)] 
libstdc++: Adjust uses of null pointer constants in docs

libstdc++-v3/ChangeLog:

* doc/xml/manual/extensions.xml: Fix example to declare and
qualify std::free, and use NULL instead of 0.
* doc/html/manual/ext_demangling.html: Regenerate.
* libsupc++/cxxabi.h: Adjust doxygen comments.

2 years agodoc: remove stray @gol
Jason Merrill [Tue, 18 Apr 2023 20:28:24 +0000 (16:28 -0400)] 
doc: remove stray @gol

@gol was removed in r13-6778, new doc additions can't use it.

gcc/ChangeLog:

* doc/invoke.texi: Remove stray @gol.

2 years agoifcvt.cc: Prevent excessive if-conversion for conditional moves
Takayuki 'January June' Suwa [Tue, 18 Apr 2023 20:11:09 +0000 (14:11 -0600)] 
ifcvt.cc: Prevent excessive if-conversion for conditional moves

gcc/
* ifcvt.cc (cond_move_process_if_block): Consider the result of
targetm.noce_conversion_profitable_p() when replacing the original
sequence with the converted one.

2 years agoAdd -gcodeview option
Mark Harmstone [Tue, 18 Apr 2023 19:55:35 +0000 (13:55 -0600)] 
Add -gcodeview option

gcc/
* common.opt (gcodeview): Add new option.
* gcc.cc (driver_handle_option); Handle OPT_gcodeview.
* opts.cc (command_handle_option): Similarly.
* doc/invoke.texi: Add documentation for -gcodeview.

2 years agoPHIOPT: Move tree_ssa_cs_elim into pass_cselim::execute.
Andrew Pinski [Fri, 31 Mar 2023 00:00:20 +0000 (00:00 +0000)] 
PHIOPT: Move tree_ssa_cs_elim into pass_cselim::execute.

This moves around the code for tree_ssa_cs_elim slightly
improving code readability and removing declarations that
are no longer needed.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration.
(make_pass_phiopt): Make execute out of line.
(tree_ssa_cs_elim): Move code into ...
(pass_cselim::execute): here.

2 years agogcc: Drop obsolete INCLUDE_PTHREAD_H
Sam James [Tue, 18 Apr 2023 19:27:38 +0000 (13:27 -0600)] 
gcc: Drop obsolete INCLUDE_PTHREAD_H

gcc/ChangeLog:
* system.h: Drop unused INCLUDE_PTHREAD_H.

2 years agovect: Verify that GET_MODE_UNITS is greater than one for vect_grouped_store_supported
Kevin Lee [Tue, 18 Apr 2023 18:42:17 +0000 (12:42 -0600)] 
vect: Verify that GET_MODE_UNITS is greater than one for vect_grouped_store_supported

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition.

2 years agoAdd TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr<mode>3 patterns
Sinan Lin [Tue, 18 Apr 2023 18:24:52 +0000 (12:24 -0600)] 
Add TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr<mode>3 patterns

gcc/
* config/riscv/bitmanip.md (rotr<mode>3 expander): Enable for ZBKB.
(bswapdi2, bswapsi2): Similarly.

2 years agoi386: Improve permutations with INSERTPS instruction [PR94908]
Uros Bizjak [Tue, 18 Apr 2023 15:50:37 +0000 (17:50 +0200)] 
i386: Improve permutations with INSERTPS instruction [PR94908]

INSERTPS can select any element from src and insert into any place
of the dest.  For SSE4.1 targets, compiler can generate e.g.

insertps $64, %xmm0, %xmm1

to insert element 1 from %xmm1 to element 0 of %xmm0.

gcc/ChangeLog:

PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_<mode>): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.

gcc/testsuite/ChangeLog:

PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.

2 years agolibstdc++: Fix preprocessor condition in linker script [PR108969]
Jonathan Wakely [Tue, 18 Apr 2023 16:22:40 +0000 (17:22 +0100)] 
libstdc++: Fix preprocessor condition in linker script [PR108969]

The linker script is preprocessed with $(top_builddir)/config.h not the
include/$target/bits/c++config.h version, which means that configure
macros do not have the _GLIBCXX_ prefix yet.

The _GLIBCXX_SYMVER_GNU and _GLIBCXX_SHARED checks are redundant,
because the gnu.ver file is only used for _GLIBCXX_SYMVER_GNU and the
linker script is only used for the shared library. Remove those.

libstdc++-v3/ChangeLog:

PR libstdc++/108969
* config/abi/pre/gnu.ver: Fix preprocessor condition.

2 years agoAdd GTY support for vrange.
Aldy Hernandez [Thu, 23 Feb 2023 08:10:16 +0000 (09:10 +0100)] 
Add GTY support for vrange.

IPA currently puts *some* irange's in GC memory.  When I contribute
support for generic ranges in IPA, we'll need to change this to
vrange.  This patch adds GTY support for both vrange and frange.

gcc/ChangeLog:

* value-range.cc (gt_ggc_mx): New.
(gt_pch_nx): New.
* value-range.h (class vrange): Add GTY marker.
(class frange): Same.
(gt_ggc_mx): Remove.
(gt_pch_nx): Remove.

2 years agoconstraint: fix relaxed memory and repeated constraint handling
Victor L. Do Nascimento [Tue, 18 Apr 2023 16:11:58 +0000 (17:11 +0100)] 
constraint: fix relaxed memory and repeated constraint handling

The function `constrain_operands' lacked the logic to consider relaxed
memory constraints when "traditional" memory constraints were not
satisfied, creating potential issues as observed during the reload
compilation pass.

In addition, it was observed that while `constrain_operands' chooses
to disregard constraints when more than one alternative is provided,
e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to
determine whether the multiple constraints in a given string are in
fact repetitions of the same constraint and should thus in fact be
treated as a single constraint, as ought to be the case for something
like "m,m".

Both of these issues are dealt with here, thus ensuring that we get
appropriate pattern matching.

gcc/
* lra-constraints.cc (constraint_unique): New.
(process_address_1): Apply constraint_unique test.
* recog.cc (constrain_operands): Allow relaxed memory
constaints.

2 years agolibstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]
Jonathan Wakely [Tue, 18 Apr 2023 13:37:38 +0000 (14:37 +0100)] 
libstdc++: Export global iostreams with GLIBCXX_3.4.31 symver [PR108969]

Since GCC 13 the global iostream objects are only initialized once in
libstdc++, and not by a std::ios::Init object in every translation unit
that includes <iostream>. To avoid using uninitialized streams defined
in an older libstdc++.so, translation units using the global iostreams
should depend on the GLIBCXX_3.4.31 symver.

Define std::cin as std::__io::cin and then export it as
std::cin@@GLIBCXX_3.4.31 so that references to std::cin bind to the new
symver. Also export it as @GLIBCXX_3.4 for backwards compatibility

libstdc++-v3/ChangeLog:

PR libstdc++/108969
* src/Makefile.am: Move globals_io.cc to here.
* src/Makefile.in: Regenerate.
* src/c++98/Makefile.am: Remove globals_io.cc from here.
* src/c++98/Makefile.in: Regenerate.
* src/c++98/globals_io.cc [_GLIBCXX_SYMVER_GNU] (cin): Adjust
symbol name and then export with GLIBCXX_3.4.31 symver.
(cout, cerr, clog, wcin, wcout, wcerr, wclog): Likewise.
* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/m68k-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt:
Regenerate.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt:
Regenerate.
* config/abi/pre/gnu.ver: Add iostream objects to new symver.

2 years agoDocs: Add doc for RISC-V vector intrinsics
Kito Cheng [Tue, 18 Apr 2023 10:07:06 +0000 (18:07 +0800)] 
Docs: Add doc for RISC-V vector intrinsics

Document which version of RISC-V vector intrinsics has implemented in
GCC.

gcc/ChangeLog:

* doc/extend.texi (Target Builtins): Add RISC-V Vector
Intrinsics.
(RISC-V Vector Intrinsics): Document GCC implemented which
version of RISC-V vector intrinsics and its reference.

2 years agomiddle-end/108786 - add bitmap_clear_first_set_bit
Richard Biener [Tue, 14 Feb 2023 15:36:03 +0000 (16:36 +0100)] 
middle-end/108786 - add bitmap_clear_first_set_bit

This adds bitmap_clear_first_set_bit and uses it where previously
bitmap_clear_bit followed bitmap_first_set_bit.  The advantage
is speeding up the search and avoiding to clobber ->current.

PR middle-end/108786
* bitmap.h (bitmap_clear_first_set_bit): New.
* bitmap.cc (bitmap_first_set_bit_worker): Rename from
bitmap_first_set_bit and add optional clearing of the bit.
(bitmap_first_set_bit): Wrap bitmap_first_set_bit_worker.
(bitmap_clear_first_set_bit): Likewise.
* df-core.cc (df_worklist_dataflow_doublequeue): Use
bitmap_clear_first_set_bit.
* graphite-scop-detection.cc (scop_detection::merge_sese):
Likewise.
* sanopt.cc (sanitize_asan_mark_unpoison): Likewise.
(sanitize_asan_mark_poison): Likewise.
* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Likewise.
* tree-into-ssa.cc (rewrite_blocks): Likewise.
* tree-ssa-dce.cc (simple_dce_from_worklist): Likewise.
* tree-ssa-sccvn.cc (do_rpo_vn_1): Likewise.

2 years agoShrink points-to analysis dumps when not dumping with -details
Richard Biener [Fri, 10 Mar 2023 11:23:09 +0000 (12:23 +0100)] 
Shrink points-to analysis dumps when not dumping with -details

The following allows to get PTA stats with -stats without blowing
up your filesystem by guarding constraint and solution dumping
with TDF_DETAILS and the SSA points-to info with TDF_DETAILS
or TDF_ALIAS.

* tree-ssa-structalias.cc (dump_sa_stats): Split out from...
(dump_sa_points_to_info): ... this function.
(compute_points_to_sets): Guard large dumps with TDF_DETAILS,
and call dump_sa_stats guarded with TDF_STATS.
(ipa_pta_execute): Likewise.
(compute_may_aliases): Guard dump_alias_info with
TDF_DETAILS|TDF_ALIAS.

* gcc.dg/ipa/ipa-pta-16.c: Use -details for dump.
* gcc.dg/tm/alias-1.c: Likewise.
* gcc.dg/tm/alias-2.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/torture/pr39074-2.c: Likewise.
* gcc.dg/torture/pr39074.c: Likewise.
* gcc.dg/torture/pta-callused-1.c: Likewise.
* gcc.dg/torture/pta-escape-1.c: Likewise.
* gcc.dg/torture/pta-ptrarith-1.c: Likewise.
* gcc.dg/torture/pta-ptrarith-2.c: Likewise.
* gcc.dg/torture/pta-ptrarith-3.c: Likewise.
* gcc.dg/torture/pta-structcopy-1.c: Likewise.
* gcc.dg/torture/ssa-pta-fn-1.c: Likewise.
* gcc.dg/tree-ssa/alias-19.c: Likewise.
* gcc.dg/tree-ssa/pta-callused.c: Likewise.
* gcc.dg/tree-ssa/pta-fp.c: Likewise.
* gcc.dg/tree-ssa/pta-ptrarith-1.c: Likewise.
* gcc.dg/tree-ssa/pta-ptrarith-2.c: Likewise.

2 years agoPHIOPT: add folding/simplification detail to the dump
Andrew Pinski [Tue, 4 Apr 2023 00:09:27 +0000 (00:09 +0000)] 
PHIOPT: add folding/simplification detail to the dump

While debugging PHI-OPT with match-and-simplify,
I found that adding more dumping to the debug dumps made
it easier to understand what was going on rather than stepping in
the debugger so this adds them. Note I used TDF_FOLDING rather
than TDF_DETAILS as these debug messages can be chatty and
only needed if you are debugging match and simplify
with PHI-OPT and match and simplify uses TDF_FOLDING as
its check.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (gimple_simplify_phiopt): Dump
the expression that is being tried when TDF_FOLDING
is true.
(phiopt_worker::match_simplify_replacement): Dump
the sequence which was created by gimple_simplify_phiopt
when TDF_FOLDING is true.

2 years agoPHIOPT: small cleanup in match_simplify_replacement
Andrew Pinski [Sun, 9 Apr 2023 02:03:38 +0000 (19:03 -0700)] 
PHIOPT: small cleanup in match_simplify_replacement

We know that the statement we are moving is already
have a SSA_NAME on the lhs so we don't need to
check that and can also just call reset_flow_sensitive_info
with the name we already got.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (match_simplify_replacement):
Simplify code that does the movement slightly.

2 years agoaarch64: Use standard RTL codes for __rev16 intrinsic expansion
Kyrylo Tkachov [Tue, 18 Apr 2023 14:06:49 +0000 (15:06 +0100)] 
aarch64: Use standard RTL codes for __rev16 intrinsic expansion

I noticed for the expansion of the __rev16* arm_acle.h intrinsics we don't need to use an unspec just because it doesn't match neatly to a bswap code.
We have organic combine patterns for it that we can reuse.
This patch removes the define_insn using UNSPEC_REV (should it have been an UNSPEC_REV16?) and adds an expander to emit
the patterns we have for rev16 using standard RTL codes.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64.md (@aarch64_rev16<mode>): Change to
define_expand.
(rev16<mode>2): Rename to...
(aarch64_rev16<mode>2_alt1): ... This.
(rev16<mode>2_alt): Rename to...
(*aarch64_rev16<mode>2_alt2): ... This.

2 years agoDeclare dconstm0 to go along with dconst0 and friends.
Aldy Hernandez [Wed, 12 Apr 2023 12:05:57 +0000 (14:05 +0200)] 
Declare dconstm0 to go along with dconst0 and friends.

Negating dconst0 is getting pretty old, and we will keep adding copies
of the same idiom.  Fixed by adding a dconstm0 constant to go along
with dconst1, dconstm1, etc.

gcc/ChangeLog:

* emit-rtl.cc (init_emit_once): Initialize dconstm0.
* gimple-range-op.cc (class cfn_signbit): Remove dconstm0
declaration.
* range-op-float.cc (zero_range): Use dconstm0.
(zero_to_inf_range): Same.
* real.h (dconstm0): New.
* value-range.cc (frange::flush_denormals_to_zero): Use dconstm0.
(frange::set_zero): Do not declare dconstm0.

2 years agoRAII auto_mpfr and autp_mpz
Richard Biener [Mon, 6 Mar 2023 10:06:38 +0000 (11:06 +0100)] 
RAII auto_mpfr and autp_mpz

The following adds two RAII classes, one for mpz_t and one for mpfr_t
making object lifetime management easier.  Both formerly require
explicit initialization with {mpz,mpfr}_init and release with
{mpz,mpfr}_clear.

I've converted two example places (where lifetime is trivial).

* system.h (class auto_mpz): New,
* realmpfr.h (class auto_mpfr): Likewise.
* fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
(do_mpfr_arg2): Likewise.
* tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;

2 years agoaarch64: Use intrinsic flags information rather than hardcoding FLAG_AUTO_FP
Kyrylo Tkachov [Tue, 18 Apr 2023 13:36:14 +0000 (14:36 +0100)] 
aarch64: Use intrinsic flags information rather than hardcoding FLAG_AUTO_FP

We record the flags to use for the intrinsics in aarch64_simd_intrinsic_data, so use it when initialising them
rather than using a hardcoded FLAG_AUTO_FP. The current vreinterpret intrinsics use FLAG_AUTO_FP anyway so this
patch is an NFC but this will be needed as we migrate more builtins into the intrinsics infrastructure.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_intrinsics): Take
builtin flags from intrinsic data rather than hardcoded FLAG_AUTO_FP.

2 years agoFixed typo.
Jin Ma [Tue, 18 Apr 2023 13:28:22 +0000 (07:28 -0600)] 
Fixed typo.

gcc/ada
* gcc-interface/utils.cc (unchecked_convert): Fixed typo.

2 years agoReturn true from operator== for two identical ranges containing NAN.
Aldy Hernandez [Thu, 26 Jan 2023 03:46:54 +0000 (04:46 +0100)] 
Return true from operator== for two identical ranges containing NAN.

The == operator for ranges signifies that two ranges contain the same
thing, not that they are ultimately equal.  So [2,4] == [2,4], even
though one may be a 2 and the other may be a 3.  Similarly with two
VARYING ranges.

There is an oversight in frange::operator== where we are returning
false for two identical NANs.  This is causing us to never cache NANs
in sbr_sparse_bitmap::set_bb_range.

gcc/ChangeLog:

* value-range.cc (frange::operator==): Adjust for NAN.
(range_tests_nan): Remove some NAN tests.

2 years agoAdd inchash support for vrange.
Aldy Hernandez [Thu, 23 Feb 2023 07:48:28 +0000 (08:48 +0100)] 
Add inchash support for vrange.

This patch provides inchash support for vrange.  It is along the lines
of the streaming support I just posted and will be used for IPA
hashing of ranges.

gcc/ChangeLog:

* inchash.cc (hash::add_real_value): New.
* inchash.h (class hash): Add add_real_value.
* value-range.cc (add_vrange): New.
* value-range.h (inchash::add_vrange): New.