]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
21 months agoRISC-V: Add vector fmin/fmax expanders.
Robin Dapp [Fri, 27 Oct 2023 11:58:05 +0000 (13:58 +0200)] 
RISC-V: Add vector fmin/fmax expanders.

This patch adds expanders for fmin and fmax.  As per RISC-V V Spec 1.0
vfmin/vfmax are IEEE 754-2019 compliant which differs from IEEE 754-2008
that fmin/fmax require (particularly in the signaling-NaN handling).
Therefore the pattern conditions include a !HONOR_SNANS.

gcc/ChangeLog:

* config/riscv/autovec.md (<ieee_fmaxmin_op><mode>3): fmax/fmin
expanders.
(cond_<ieee_fmaxmin_op><mode>): Ditto.
(cond_len_<ieee_fmaxmin_op><mode>): Ditto.
(reduc_fmax_scal_<mode>): Ditto.
(reduc_fmin_scal_<mode>): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add fmin/fmax.
* config/riscv/vector-iterators.md (fmin): New UNSPEC.
(UNSPEC_VFMIN): Ditto.
* config/riscv/vector.md (@pred_<ieee_fmaxmin_op><mode>): Add
UNSPEC insn patterns.
(@pred_<ieee_fmaxmin_op><mode>_scalar): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Remove
-ffast-math.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/fmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_zvfh_run-10.c: New test.

21 months agogenemit: Split insn-emit.cc into several partitions.
Robin Dapp [Thu, 12 Oct 2023 09:23:26 +0000 (11:23 +0200)] 
genemit: Split insn-emit.cc into several partitions.

On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create several partitions
(insn-emit-1.cc to insn-emit-n.cc).  The available patterns are
written to the given files in a sequential fashion.

Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.

gcc/ChangeLog:

PR bootstrap/84402
PR target/111600

* Makefile.in: Handle split insn-emit.cc.
* configure: Regenerate.
* configure.ac: Add --with-insnemit-partitions.
* genemit.cc (output_peephole2_scratches): Print to file instead
of stdout.
(print_code): Ditto.
(gen_rtx_scratch): Ditto.
(gen_exp): Ditto.
(gen_emit_seq): Ditto.
(emit_c_code): Ditto.
(gen_insn): Ditto.
(gen_expand): Ditto.
(gen_split): Ditto.
(output_add_clobbers): Ditto.
(output_added_clobbers_hard_reg_p): Ditto.
(print_overload_arguments): Ditto.
(print_overload_test): Ditto.
(handle_overloaded_code_for): Ditto.
(handle_overloaded_gen): Ditto.
(print_header): New function.
(handle_arg): New function.
(main): Split output into 10 files.
* gensupport.cc (count_patterns): New function.
* gensupport.h (count_patterns): Define.
* read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
* read-md.h (class md_reader): Change definition.

21 months agohardcfr: support checking at abnormal edges [PR111943]
Alexandre Oliva [Tue, 31 Oct 2023 12:32:08 +0000 (09:32 -0300)] 
hardcfr: support checking at abnormal edges [PR111943]

Control flow redundancy may choose abnormal edges for early checking,
but that breaks because we can't insert checks on such edges.

Introduce conditional checking on the dest block of abnormal edges,
and leave it for the optimizer to drop the conditional.

for  gcc/ChangeLog

PR tree-optimization/111943
* gimple-harden-control-flow.cc: Adjust copyright year.
(rt_bb_visited): Add vfalse and vtrue data members.
Zero-initialize them in the ctor.
(rt_bb_visited::insert_exit_check_on_edge): Upon encountering
abnormal edges, insert initializers for vfalse and vtrue on
entry, and insert the check sequence guarded by a conditional
in the dest block.

for  libgcc/ChangeLog

* hardcfr.c: Adjust copyright year.

for  gcc/testsuite/ChangeLog

PR tree-optimization/111943
* gcc.dg/harden-cfr-pr111943.c: New.

21 months agotree-optimization/112305 - SCEV cprop and conditional undefined overflow
Richard Biener [Tue, 31 Oct 2023 09:13:13 +0000 (10:13 +0100)] 
tree-optimization/112305 - SCEV cprop and conditional undefined overflow

The following adjusts final value replacement to also rewrite the
replacement to defined overflow behavior if there's conditionally
evaluated stmts (with possibly undefined overflow), not only when
we "folded casts".  The patch hooks into expression_expensive for
this.

PR tree-optimization/112305
* tree-scalar-evolution.h (expression_expensive): Adjust.
* tree-scalar-evolution.cc (expression_expensive): Record
when we see a COND_EXPR.
(final_value_replacement_loop): When the replacement contains
a COND_EXPR, rewrite it to defined overflow.
* tree-ssa-loop-ivopts.cc (may_eliminate_iv): Adjust.

* gcc.dg/torture/pr112305.c: New testcase.

21 months agod: Clean-up unused variable assignments after interface change
Iain Buclaw [Tue, 31 Oct 2023 11:20:02 +0000 (12:20 +0100)] 
d: Clean-up unused variable assignments after interface change

The lowering done for invoking `new' on a single dimension array was
moved from the code generator to the front-end semantic pass in
r14-4996.  This removes the detritus left behind in the code generator
from that deletion.

gcc/d/ChangeLog:

* expr.cc (ExprVisitor::visit (NewExp *)): Remove unused assignments.

21 months agoLoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]
Xi Ruoyao [Mon, 30 Oct 2023 11:39:27 +0000 (19:39 +0800)] 
LoongArch: Define HAVE_AS_TLS to 0 if it's undefined [PR112299]

Now loongarch.md uses HAVE_AS_TLS, we need this to fix the failure
building a cross compiler if the cross assembler is not installed yet.

gcc/ChangeLog:

PR target/112299
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS): Define to 0
if not defined yet.

21 months agoRISC-V: Add assert of the number of vmerge in autovec cond testcases
Lehua Ding [Tue, 31 Oct 2023 03:50:42 +0000 (11:50 +0800)] 
RISC-V: Add assert of the number of vmerge in autovec cond testcases

This patch adds more asserts about the vmerge insns which is intended
to ensure better performance for cond autovec.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: Add vmerge assert.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_min_max-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-10.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-11.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-5.c: New test.

21 months agomatch.pd: Support combine cond_len_op + vec_cond similar to cond_op
Lehua Ding [Tue, 19 Sep 2023 07:53:54 +0000 (15:53 +0800)] 
match.pd: Support combine cond_len_op + vec_cond similar to cond_op

This patch adds combine cond_len_op and vec_cond to cond_len_op like
cond_op.

Consider this code (RISC-V target):
  void
  foo (uint8_t *__restrict x, uint8_t *__restrict y, uint8_t *__restrict z,
       uint8_t *__restrict pred, uint8_t *__restrict merged, int n)
  {
    for (int i = 0; i < n; ++i)
      x[i] = pred[i] != 1 ? y[i] / z[i] : merged[i];
  }

Before this patch:
  ...
  vect_iftmp.18_71 = .COND_LEN_DIV (mask__31.11_61, vect__5.14_65, vect__7.17_69, { 0, ... }, _86, 0);
  vect_iftmp.23_78 = .VCOND_MASK (mask__31.11_61, vect_iftmp.18_71, vect_iftmp.22_77);
  ...

After this patch:
  ...
  _30 = .COND_LEN_DIV (mask__31.16_61, vect__5.19_65, vect__7.22_69, vect_iftmp.27_77, _85, 0);
  ...

gcc/ChangeLog:

* gimple-match.h (gimple_match_op::gimple_match_op):
Add interfaces for more arguments.
(gimple_match_op::set_op): Add interfaces for more arguments.
* match.pd: Add support of combining cond_len_op + vec_cond

21 months agoFix incorrect option mask and avx512cd target push
Haochen Jiang [Tue, 31 Oct 2023 05:33:49 +0000 (13:33 +0800)] 
Fix incorrect option mask and avx512cd target push

gcc/ChangeLog:

* config/i386/avx512cdintrin.h (target): Push evex512 for
avx512cd.
* config/i386/avx512vlintrin.h (target): Split avx512cdvl part
out from avx512vl.
* config/i386/i386-builtin.def (BDESC): Do not check evex512
for builtins not needed.

21 months agoRISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond
Lehua Ding [Tue, 31 Oct 2023 03:18:28 +0000 (11:18 +0800)] 
RISC-V: Add the missed combine of [u]int64 -> _Float16 and vcond

Hi,

This patch let the INT64 to FP16 convert split to two small converts
(INT64 -> FP32 and FP32 -> FP16) when expanding instead of dealy the
split to split1 pass. This change could make it possible to combine
the FP32 to FP16 and vcond patterns and so we don't need to add an
combine pattern for INT64 to FP16 and vcond patterns.

Consider this code:
  void
  foo (_Float16 *__restrict r, int64_t *__restrict a, _FLoat16 *__restrict b,
       int64_t *__restrict pred, int n)
  {
    for (int i = 0; i < n; i += 1)
      {
        r[i] = pred[i] ? (_Float16) a[i] : b[i];
      }
  }

Before this patch:
  ...
  vfncvt.f.f.w    v2,v2
  vmerge.vvm      v1,v1,v2,v0
  vse16.v v1,0(a0)
  ...

After this patch:
  ...
  vfncvt.f.f.w    v1,v2,v0.t
  vse16.v v1,0(a0)
  ...

gcc/ChangeLog:

* config/riscv/autovec.md (<float_cvt><mode><vnnconvert>2):
Change to define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c:
Add vfncvt.f.f.w assert.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c:
Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c:
Ditto.

21 months agoFix wrong code due to incorrect define_split
liuhongt [Mon, 30 Oct 2023 06:05:25 +0000 (14:05 +0800)] 
Fix wrong code due to incorrect define_split

-(define_split
-  [(set (match_operand:V2HI 0 "register_operand")
-        (eq:V2HI
-          (eq:V2HI
-            (us_minus:V2HI
-              (match_operand:V2HI 1 "register_operand")
-              (match_operand:V2HI 2 "register_operand"))
-            (match_operand:V2HI 3 "const0_operand"))
-          (match_operand:V2HI 4 "const0_operand")))]
-  "TARGET_SSE4_1"
-  [(set (match_dup 0)
-        (umin:V2HI (match_dup 1) (match_dup 2)))
-   (set (match_dup 0)
-        (eq:V2HI (match_dup 0) (match_dup 2)))])

the splitter is wrong when op1 == op2.(the original pattern returns 0, after split, it returns 1)
So remove the splitter.

Also extend another define_split to define_insn_and_split to handle
below pattern

494(set (reg:V4QI 112)
495    (unspec:V4QI [
496            (subreg:V4QI (reg:V2HF 111 [ bf ]) 0)
497            (subreg:V4QI (reg:V2HF 110 [ af ]) 0)
498            (subreg:V4QI (eq:V2HI (eq:V2HI (reg:V2HI 105)
499                        (const_vector:V2HI [
500                                (const_int 0 [0]) repeated x2
501                            ]))
502                    (const_vector:V2HI [
503                            (const_int 0 [0]) repeated x2
504                        ])) 0)
505        ] UNSPEC_BLENDV))

define_split doesn't work since pass_combine assume it produces at
most 2 insns after split, but here it produces 3 since we need to move
const0_rtx (V2HImode) to reg. The move insn can be eliminated later.

gcc/ChangeLog:

PR target/112276
* config/i386/mmx.md (*mmx_pblendvb_v8qi_1): Change
define_split to define_insn_and_split to handle
immediate_operand for comparison.
(*mmx_pblendvb_v8qi_2): Ditto.
(*mmx_pblendvb_<mode>_1): Ditto.
(*mmx_pblendvb_v4qi_2): Ditto.
(<code><mode>3): Remove define_split after it.
(<code>v8qi3): Ditto.
(<code><mode>3): Ditto.
(<ode>v2hi3): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: Adjust testcase.
* gcc.target/i386/pr112276.c: New test.

21 months agoMATCH: Add some more value_replacement simplifications to match
Andrew Pinski [Sat, 28 Oct 2023 02:23:52 +0000 (19:23 -0700)] 
MATCH: Add some more value_replacement simplifications to match

This moves a few more value_replacements simplifications to match.
/* a == 1 ? b : a * b -> a * b */
/* a == 1 ? b : b / a  -> b / a */
/* a == -1 ? b : a & b -> a & b */

Also adds a testcase to show can we catch these where value_replacement would not
(but other passes would).

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (`a == 1 ? b : a OP b`): New pattern.
(`a == -1 ? b : a & b`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-value-4.c: New test.

21 months agoMATCH: first of the value replacement moving from phiopt
Andrew Pinski [Thu, 26 Oct 2023 22:07:53 +0000 (15:07 -0700)] 
MATCH: first of the value replacement moving from phiopt

This moves a few simple patterns that are done in value replacement
in phiopt over to match.pd. Just the simple ones which might show up
in other code.

This allows some optimizations to happen even without depending
on sinking from happening and in some cases where phiopt is not
invoked (cond-1.c is an example there).

Changes since v1:
* v2: Add an extra testcase to showcase improvements at -O1.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd: (`a == 0 ? b : b + a`,
`a == 0 ? b : b - a`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cond-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-1a.c: New test.
* gcc.dg/tree-ssa/phi-opt-value-2.c: New test.

21 months agoDaily bump.
GCC Administrator [Tue, 31 Oct 2023 00:17:32 +0000 (00:17 +0000)] 
Daily bump.

21 months agoi386: Zhaoxin yongfeng enablement
Mayshao [Mon, 30 Oct 2023 21:19:12 +0000 (22:19 +0100)] 
i386: Zhaoxin yongfeng enablement

Enable -march/-mtune=yongfeng. Costs and tunings are set according
to the characteristics of the processor. Add a new .md file to describe
yongfeng processor.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize yongfeng.
* common/config/i386/i386-common.cc: Add yongfeng.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
Add ZHAOXIN_FAM7H_YONGFENG.
* config.gcc: Add yongfeng.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Let -march=native recognize yongfeng processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add yongfeng.
* config/i386/i386-options.cc (m_YONGFENG): New definition.
(m_ZHAOXIN): Ditto.
* config/i386/i386.h (enum processor_type): Add PROCESSOR_YONGFENG.
* config/i386/i386.md: Add yongfeng.
* config/i386/lujiazui.md: Fix typo.
* config/i386/x86-tune-costs.h (struct processor_costs):
Add yongfeng costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace
m_LUJIAZUI with m_ZHAOXIN.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
(X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about yongfeng.
* doc/invoke.texi: Ditto.
* config/i386/yongfeng.md: New file to describe yongfeng processor.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv32.C: Handle new -march.
* gcc.target/i386/funcspec-56.inc: Ditto.

21 months agolibstdc++: [_GLIBCXX_INLINE_VERSION] Add comment on emul TLS symbols
François Dumont [Mon, 30 Oct 2023 21:07:49 +0000 (22:07 +0100)] 
libstdc++: [_GLIBCXX_INLINE_VERSION] Add comment on emul TLS symbols

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu-versioned-namespace.ver: Add comment on recently
added emul TLS symbols.

21 months agolibstdc++: [_GLIBCXX_INLINE_VERSION] Un-weak handle_contract_violation
François Dumont [Mon, 30 Oct 2023 18:35:35 +0000 (19:35 +0100)] 
libstdc++: [_GLIBCXX_INLINE_VERSION] Un-weak handle_contract_violation

libstdc++-v3/ChangeLog:

* src/experimental/contract.cc
[_GLIBCXX_INLINE_VERSION](handle_contract_violation): Rework comment.
Remove weak attribute.

21 months agoconfigure, fixincludes: Add change missed in r14-4825.
Iain Sandoe [Mon, 30 Oct 2023 18:55:31 +0000 (18:55 +0000)] 
configure, fixincludes: Add change missed in r14-4825.

This corrects an oversight in the r14-4825 commit.

fixincludes/ChangeLog:

* configure: Regenerate.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
21 months agoipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)
Martin Jambor [Mon, 30 Oct 2023 17:34:59 +0000 (18:34 +0100)] 
ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)

PR 111157 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.

This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed."  Unfortunately, doing so is tricky.  First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary.  Looking stuff up there means doing what most of what
modref_may_alias may do but doing exactly what it does is tricky
because it takes also aliasing into account and has bail-out counters.

To err on the side of caution in order to avoid this miscompilation we
have to prune a constant when in doubt.  However, pruning can
interfere with the mechanism of how clone materialization
distinguishes between the cases when a parameter was entirely removed
and when it was both IPA-CPed and IPA-SRAed (in order to make up for
the removal in debug info, which can bump into an assert when
compiling g++.dg/torture/pr103669.C when we are not careful).

Therefore this patch:

  1) marks constants that IPA-modref has in its kill list with a new
     "killed" flag, and
  2) prunes the list from entries with this flag after materialization
     and IPA-CP transformation is done using the template introduced in
     the previous patch

It does not try to look up anything in the load lists, this will be
done as a follow-up in order to ease review.

gcc/ChangeLog:

2023-10-27  Martin Jambor  <mjambor@suse.cz>

PR ipa/111157
* ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
* ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
(update_signature): Mark any any IPA-CP aggregate constants at
positions known to be killed as killed.  Move check that there is
clone_info after this pruning.
* ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
(ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
(push_agg_values_from_plats): Likewise.
(ipa_push_agg_values_from_jfunc): Likewise.
(estimate_local_effects): Likewise.
(push_agg_values_for_index_from_edge): Likewise.
* ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
flag.
(read_ipcp_transformation_info): Likewise.
(ipcp_get_aggregate_const): Update comment, assert that encountered
record does not have killed flag set.
(ipcp_transform_function): Prune all aggregate constants with killed
set.

gcc/testsuite/ChangeLog:

2023-09-18  Martin Jambor  <mjambor@suse.cz>

PR ipa/111157
* gcc.dg/lto/pr111157_0.c: New test.
* gcc.dg/lto/pr111157_1.c: Second file of the same new test.

21 months agoipa-cp: Templatize filtering of m_agg_values
Martin Jambor [Mon, 30 Oct 2023 17:34:59 +0000 (18:34 +0100)] 
ipa-cp: Templatize filtering of m_agg_values

PR 111157 points to another place where IPA-CP collected aggregate
compile-time constants need to be filtered, in addition to the one
place that already does this in ipa-sra.  In order to re-use code,
this patch turns the common bit into a template.

The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.

gcc/ChangeLog:

2023-09-13  Martin Jambor  <mjambor@suse.cz>

PR ipa/111157
* ipa-prop.h (ipcp_transformation): New member function template
remove_argaggs_if.
* ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
filter aggreagate constants.

21 months agoRISC-V: Make rv32i_zcmp testcase more robust
Patrick O'Neill [Mon, 30 Oct 2023 16:30:01 +0000 (09:30 -0700)] 
RISC-V: Make rv32i_zcmp testcase more robust

GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
21 months agoARC: Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.
Roger Sayle [Mon, 30 Oct 2023 16:21:28 +0000 (16:21 +0000)] 
ARC: Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.

This patch optimizes PR middle-end/101955 for the ARC backend.  On ARC
CPUs with a barrel shifter, using two shifts is optimal as:

        asl_s   r0,r0,31
        asr_s   r0,r0,31

but without a barrel shifter, GCC -O2 -mcpu=em currently generates:

        and     r2,r0,1
        ror     r2,r2
        add.f   0,r2,r2
        sbc     r0,r0,r0

with this patch, we now generate the smaller, faster and non-flags
clobbering:

        bmsk_s  r0,r0,0
        neg_s   r0,r0

2023-10-30  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/101955
* config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
to convert sign extract of the least significant bit into an
AND $1 then a NEG when !TARGET_BARREL_SHIFTER.

gcc/testsuite/ChangeLog
PR middle-end/101955
* gcc.target/arc/pr101955.c: New test case.

21 months agoARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.
Roger Sayle [Mon, 30 Oct 2023 16:17:42 +0000 (16:17 +0000)] 
ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.

The motivating example is derived from PR rtl-optimization/110717.

struct S { int a : 5; };
unsigned int foo (struct S *p) {
  return p->a;
}

With a barrel shifter, GCC -O2 generates the reasonable:

foo:    ldb_s   r0,[r0]
        asl_s   r0,r0,27
        j_s.d   [blink]
        asr_s   r0,r0,27

What's interesting is that during combine, the middle-end actually
has two shifts by three bits, and a sign-extension from QI to SI.

Trying 8, 9 -> 11:
    8: r158:SI=r157:QI#0<<0x3
      REG_DEAD r157:QI
    9: r159:SI=sign_extend(r158:SI#0)
      REG_DEAD r158:SI
   11: r155:SI=r159:SI>>0x3
      REG_DEAD r159:SI

Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops.  This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.

Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:

foo: ldb_s   r0,[r0]
        mov     lp_count,27
        lp      2f
        add     r0,r0,r0
        nop
2:      # end single insn loop
        mov     lp_count,27
        lp      2f
        asr     r0,r0
        nop
2:      # end single insn loop
        j_s     [blink]

which contains two loops and requires about ~113 cycles to execute.
With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:

foo: ldb_s   r0,[r0]
        mov_s   r2,0    ;3
        add3    r0,r2,r0
        sexb_s  r0,r0
        asr_s   r0,r0
        asr_s   r0,r0
        j_s.d   [blink]
        asr_s   r0,r0

which requires only ~6 cycles, for the shorter shifts by 3 and sign
extension.

2023-10-30  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
Provide reasonable values for SHIFTS and ROTATES by constant
bit counts depending upon TARGET_BARREL_SHIFTER.
(arc_insn_cost): Use insn attributes if the instruction is
recognized.  Avoid calling get_attr_length for type "multi",
i.e. define_insn_and_split patterns without explicit type.
Fall-back to set_rtx_cost for single_set and pattern_cost
otherwise.
* config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
(BRANCH_COST): Improve/correct definition.
(LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.

21 months agoARC: Improved SImode shifts and rotates with -mswap.
Roger Sayle [Mon, 30 Oct 2023 16:12:30 +0000 (16:12 +0000)] 
ARC: Improved SImode shifts and rotates with -mswap.

This patch improves the code generated by the ARC back-end for CPUs
without a barrel shifter but with -mswap.  The -mswap option provides
a SWAP instruction that implements SImode rotations by 16, but also
logical shift instructions (left and right) by 16 bits.  Clearly these
are also useful building blocks for implementing shifts by 17, 18, etc.
which would otherwise require a loop.

As a representative example:
int shl20 (int x) { return x << 20; }

GCC with -O2 -mcpu=em -mswap would previously generate:

shl20:  mov     lp_count,10
        lp      2f
        add     r0,r0,r0
        add     r0,r0,r0
2:      # end single insn loop
        j_s     [blink]

with this patch we now generate:

shl20:  mov_s   r2,0    ;3
        lsl16   r0,r0
        add3    r0,r2,r0
        j_s.d   [blink]
        asl_s r0,r0

Although both are four instructions (excluding the j_s),
the original takes ~22 cycles, and replacement ~4 cycles.

2023-10-30  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/arc/arc.cc (arc_split_ashl): Use lsl16 on TARGET_SWAP.
(arc_split_ashr): Use swap and sign-extend on TARGET_SWAP.
(arc_split_lshr): Use lsr16 on TARGET_SWAP.
(arc_split_rotl): Use swap on TARGET_SWAP.
(arc_split_rotr): Likewise.
* config/arc/arc.md (ANY_ROTATE): New code iterator.
(<ANY_ROTATE>si2_cnt16): New define_insn for alternate form of
swap instruction on TARGET_SWAP.
(ashlsi2_cnt16): Rename from *ashlsi16_cnt16 and move earlier.
(lshrsi2_cnt16): New define_insn for LSR16 instruction.
(*ashlsi2_cnt16): See above.

gcc/testsuite/ChangeLog
* gcc.target/arc/lsl16-1.c: New test case.
* gcc.target/arc/lsr16-1.c: Likewise.
* gcc.target/arc/swap-1.c: Likewise.
* gcc.target/arc/swap-2.c: Likewise.

21 months agoarm: move the switch tables for Arm to the RO data section.
Richard Ball [Mon, 30 Oct 2023 15:31:26 +0000 (15:31 +0000)] 
arm: move the switch tables for Arm to the RO data section.

Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.

gcc/ChangeLog:

* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
from (!target_pure_code) condition.
(ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
.Lrtx label and remove adr instructions.
* config/arm/arm.md
(arm_casesi_internal): Use force_reg to generate ldr instructions that
would otherwise be out of range, and change rtl to accommodate force reg.
Additionally remove unnecessary register temp.
(casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
targets from JUMP_TABLES_IN_TEXT_SECTION definition.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: Alter the tests to
change adr instruction to ldr.

21 months agoTestsuite, i386: Mark test as requiring ifunc
Francois-Xavier Coudert [Mon, 30 Oct 2023 14:41:10 +0000 (15:41 +0100)] 
Testsuite, i386: Mark test as requiring ifunc

Test is currently failing on x86_64-apple-darwin.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105554.c: Require ifunc.

21 months agoTestsuite, Darwin: Fix trampoline warning
Francois-Xavier Coudert [Mon, 30 Oct 2023 13:45:47 +0000 (14:45 +0100)] 
Testsuite, Darwin: Fix trampoline warning

Heap-based trampolines are enabled on darwin20 and later,
meaning that no warning is emitted.

gcc/testsuite/ChangeLog:

* gcc.dg/Wtrampolines.c: Skip on darwin20 and later.

21 months agoTestsuite, i386: Fix test by passing -march
Francois-Xavier Coudert [Mon, 30 Oct 2023 11:50:01 +0000 (12:50 +0100)] 
Testsuite, i386: Fix test by passing -march

The test currently fails on Darwin, where the default arch is core2.

gcc/testsuite/ChangeLog:

PR target/112287
* gcc.target/i386/pr111698.c: Pass -march=sandybridge.

21 months agoTestsuite, Darwin: skip PIE test
Francois-Xavier Coudert [Mon, 30 Oct 2023 11:41:17 +0000 (12:41 +0100)] 
Testsuite, Darwin: skip PIE test

gcc/testsuite/ChangeLog:

* gcc.dg/pie-2.c: Skip test on darwin.

21 months agors6000: Change bitwise xor to an equality operator [PR106907]
Jeevitha [Mon, 30 Oct 2023 09:07:07 +0000 (04:07 -0500)] 
rs6000: Change bitwise xor to an equality operator [PR106907]

PR106907 has a few warnings spotted from cppcheck. These warnings
are related to the need of precedence clarification. Instead of using xor,
it has been changed to equality check, which achieves the same result.
Additionally, comment indentation has been fixed.

2023-10-11  Jeevitha Palanisamy  <jeevitha@linux.ibm.com>

gcc/
PR target/106907
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Change bitwise
xor to an equality and fix comment indentation.

21 months agoPR testsuite/111462 - add powerpc64le to list of ssa-sink-18.c XFAIL
Richard Biener [Mon, 30 Oct 2023 10:01:17 +0000 (11:01 +0100)] 
PR testsuite/111462 - add powerpc64le to list of ssa-sink-18.c XFAIL

PR testsuite/111462
gcc/testsuite/
* gcc.dg/tree-ssa/ssa-sink-18.c: XFAIL also powerpc64le.

21 months agoRISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32
Juzhe-Zhong [Sat, 28 Oct 2023 02:05:07 +0000 (10:05 +0800)] 
RISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32

sew64_scalar_helper is handling SEW64 vx instruction pattern on RV32 system.
According to RVV ISA, we can directly use vx instruction of SEW64 on RV32 system
since RV32 GR reg is 32bit.

Consider this following case:

vsetvl e64m1
vadd.vx v,v,x

will be transform by sew64_scalar_helper:

vsetvl e64m1
sw
sw
vlse v
vadd.vv

This bug is reported by Robin.
(insn 143 179 230 9 (set (reg:SI 15 a5 [234])
        (unspec:SI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)) 751 {vlmax_avlsi}
     (expr_list:REG_EQUIV (unspec:SI [
                (const_int 64 [0x40])
            ] UNSPEC_VLMAX)
        (nil)))
(insn 230 143 78 9 (parallel [
            (set (reg:SI 66 vl)
                (unspec:SI [
                        (reg:SI 15 a5 [234])
                        (const_int 64 [0x40])
                        (const_int 0 [0])
                    ] UNSPEC_VSETVL))
            (set (reg:SI 67 vtype)
                (unspec:SI [
                        (const_int 64 [0x40])
                        (const_int 0 [0])
                        (const_int 1 [0x1]) repeated x2
                    ] UNSPEC_VSETVL))
        ]) "bug.c":14:14 discrim 1 1469 {vsetvl_discard_resultsi}
     (nil))
(insn 78 230 84 9 (set (reg:RVVM1DI 102 v6 [203])
        (if_then_else:RVVM1DI (unspec:RVVMF64BI [
                    (const_vector:RVVMF64BI repeat [
                            (const_int 1 [0x1])
                        ])
                    (const_int 0 [0])
                    (const_int 2 [0x2]) repeated x2
                    (const_int 0 [0])
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (vec_duplicate:RVVM1DI (mem/u/c:DI (reg/f:SI 29 t4 [230]) [0  S8 A64]))
            (unspec:RVVM1DI [
                    (reg:SI 0 zero)
                ] UNSPEC_VUNDEF))) "bug.c":14:14 discrim 1 1872 {*pred_broadcastrvvm1di}
     (expr_list:REG_DEAD (reg/f:SI 29 t4 [230])
        (nil)))

The root cause of this is because we missed VLMAX handling since the codes was invented
long time ago (Callers always intrinsics codes, no VLMAX situation).

Now, all following bugs are fixed after this patch:

FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test

gcc/ChangeLog:

* config/riscv/riscv-protos.h (sew64_scalar_helper): Fix bug.
* config/riscv/riscv-v.cc (sew64_scalar_helper): Ditto.
* config/riscv/vector.md: Ditto.

21 months agoFortran: Fix a problem with SELECT TYPE selectors [PR104555].
Paul Thomas [Mon, 30 Oct 2023 07:12:40 +0000 (07:12 +0000)] 
Fortran: Fix a problem with SELECT TYPE selectors [PR104555].

2023-10-30  Paul Thomas  <pault@gcc.gnu.org>

gcc/fortran
PR fortran/104555
* resolve.cc (resolve_select_type): If the selector expression
has no class component references and the expression is a
derived type, copy the typespec of the symbol to that of the
expression.

gcc/testsuite/
PR fortran/104555
* gfortran.dg/pr104555.f90: New test.

21 months agoImprove memcmpeq for 512-bit vector with vpcmpeq + kortest.
liuhongt [Mon, 9 Oct 2023 07:07:54 +0000 (15:07 +0800)] 
Improve memcmpeq for 512-bit vector with vpcmpeq + kortest.

When 2 vectors are equal, kmask is allones and kortest will set CF,
else CF will be cleared.

So CF bit can be used to check for the result of the comparison.

Before:
        vmovdqu (%rsi), %ymm0
        vpxorq  (%rdi), %ymm0, %ymm0
        vptest  %ymm0, %ymm0
        jne     .L2
        vmovdqu 32(%rsi), %ymm0
        vpxorq  32(%rdi), %ymm0, %ymm0
        vptest  %ymm0, %ymm0
        je      .L5
.L2:
        movl    $1, %eax
        xorl    $1, %eax
        vzeroupper
        ret

After:
        vmovdqu64       (%rsi), %zmm0
        xorl    %eax, %eax
        vpcmpeqd        (%rdi), %zmm0, %k0
        kortestw        %k0, %k0
        setc    %al
        vzeroupper
        ret

gcc/ChangeLog:

PR target/104610
* config/i386/i386-expand.cc (ix86_expand_branch): Handle
512-bit vector with vpcmpeq + kortest.
* config/i386/i386.md (cbranchxi4): New expander.
* config/i386/sse.md: (cbranch<mode>4): Extend to V16SImode
and V8DImode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104610-2.c: New test.

21 months agoExpand: Checking available optabs for scalar modes in by pieces operations
Haochen Gui [Mon, 30 Oct 2023 02:59:51 +0000 (10:59 +0800)] 
Expand: Checking available optabs for scalar modes in by pieces operations

The former patch (f08ca5903c7) examines the scalar modes by target
hook scalar_mode_supported_p.  It causes some i386 regression cases
as XImode and OImode are not enabled in i386 target function.  This
patch examines the scalar mode by checking if the corresponding optabs
are available for the mode.

gcc/
PR target/111449
* expr.cc (qi_vector_mode_supported_p): Rename to...
(by_pieces_mode_supported_p): ...this, and extends it to do
the checking for both scalar and vector mode.
(widest_fixed_size_mode_for_size): Call
by_pieces_mode_supported_p to examine the mode.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.

21 months agoDaily bump.
GCC Administrator [Mon, 30 Oct 2023 00:17:23 +0000 (00:17 +0000)] 
Daily bump.

21 months agolibstdc++: [_GLIBCXX_INLINE_VERSION] Add emul TLS symbols
François Dumont [Wed, 11 Oct 2023 05:09:09 +0000 (07:09 +0200)] 
libstdc++: [_GLIBCXX_INLINE_VERSION] Add emul TLS symbols

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu-versioned-namespace.ver: Add missing emul TLS
symbols.

21 months agolibstdc++: [_GLIBCXX_INLINE_VERSION] Provide handle_contract_violation symbol
François Dumont [Tue, 19 Sep 2023 16:56:57 +0000 (18:56 +0200)] 
libstdc++: [_GLIBCXX_INLINE_VERSION] Provide handle_contract_violation symbol

libstdc++-v3/ChangeLog:

* src/experimental/contract.cc
[_GLIBCXX_INLINE_VERSION](handle_contract_violation): Provide symbol
without version namespace decoration for gcc.

21 months agod: Fix ICE: verify_gimple_failed (conversion of register to a different size in ...
Iain Buclaw [Sun, 29 Oct 2023 19:13:14 +0000 (20:13 +0100)] 
d: Fix ICE: verify_gimple_failed (conversion of register to a different size in 'view_convert_expr')

Static arrays in D are passed around by value, rather than decaying to a
pointer.  On x86_64 __builtin_va_list is an exception to this rule, but
semantically it's still treated as a static array.

This makes certain assignment operations fail due a mismatch in types.
As all examples in the test program are rejected by C/C++ front-ends,
these are now errors in D too to be consistent.

PR d/110712

gcc/d/ChangeLog:

* d-codegen.cc (d_build_call): Update call to convert_for_argument.
* d-convert.cc (is_valist_parameter_type): New function.
(check_valist_conversion): New function.
(convert_for_assignment): Update signature.  Add check whether
assigning va_list is permissible.
(convert_for_argument): Likewise.
* d-tree.h (convert_for_assignment): Update signature.
(convert_for_argument): Likewise.
* expr.cc (ExprVisitor::visit (AssignExp *)): Update call to
convert_for_assignment.

gcc/testsuite/ChangeLog:

* gdc.dg/pr110712.d: New test.

21 months agod: Merge upstream dmd, druntime e48bc0987d, phobos 2458e8f82.
Iain Buclaw [Sun, 29 Oct 2023 15:39:05 +0000 (16:39 +0100)] 
d: Merge upstream dmd, druntime e48bc0987d, phobos 2458e8f82.

D front-end changes:

    - Import dmd v2.106.0-beta.1.

D runtime changes:

    - Import druntime v2.106.0-beta.1.

Phobos changes:

    - Import phobos v2.106.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd e48bc0987d.
* expr.cc (ExprVisitor::visit (NewExp *)): Update for new front-end
interface.
* runtime.def (NEWARRAYT): Remove.
(NEWARRAYIT): Remove.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime e48bc0987d.
* src/MERGE: Merge upstream phobos 2458e8f82.

21 months agotestsuite, X86, Darwin: Skip a test for mcmodel=large.
Iain Sandoe [Sat, 28 Oct 2023 18:42:21 +0000 (19:42 +0100)] 
testsuite, X86, Darwin: Skip a test for mcmodel=large.

The large model is not implemented so far for Darwin (and the
codegen will be different when it is).

gcc/testsuite/ChangeLog:

* gcc.target/i386/large-data.c: Skip for Darwin.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
21 months agotestsuite, X86, Darwin: Skip tests with incompatible output.
Iain Sandoe [Sat, 28 Oct 2023 18:22:27 +0000 (19:22 +0100)] 
testsuite, X86, Darwin: Skip tests with incompatible output.

Darwin platforms do not currently emit .cfi_xxx instructions so that these
tests do not work there.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-interrupt-1.c: Skip for Darwin.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
21 months agotree-optimization/109334: Improve computation for access attribute
Martin Uecker [Wed, 25 Oct 2023 21:24:34 +0000 (23:24 +0200)] 
tree-optimization/109334: Improve computation for access attribute

The fix for PR104970 restricted size computations to the case
where the access attribute was specified explicitly (no VLA).
It also restricted it to void pointers or elements with constant
sizes.  The second restriction is enough to fix the original bug.
Revert the first change to again allow size computations for VLA
parameters and for VLA parameters together with an explicit access
attribute.

gcc/ChangeLog:

PR tree-optimization/109334
* tree-object-size.cc (parm_object_size): Allow size
computation for implicit access attributes.

gcc/testsuite/ChangeLog:

PR tree-optimization/109334
* gcc.dg/builtin-dynamic-object-size-0.c
(test_parmsz_simple3): Supported again.
(test_parmsz_external4): New test.
* gcc.dg/builtin-dynamic-object-size-20.c: New test.
* gcc.dg/pr104970.c: New test.

21 months agogcc: xtensa: fix salt/saltu version check
Max Filippov [Thu, 7 Sep 2023 03:13:22 +0000 (20:13 -0700)] 
gcc: xtensa: fix salt/saltu version check

gcc/
* config/xtensa/xtensa.h (TARGET_SALT): Change HW version from
260000 (which corresponds to RF-2014.0) to 270000 (which
corresponds to RG-2015.0, the release where salt/saltu opcodes
were introduced).

21 months agoRISC-V: Fix one range-loop-construct warning of avlprop
Pan Li [Sat, 28 Oct 2023 14:48:58 +0000 (22:48 +0800)] 
RISC-V: Fix one range-loop-construct warning of avlprop

This patch would like to fix one warning of avlprop as below.

../../gcc/config/riscv/riscv-avlprop.cc: In member function 'virtual
unsigned int pass_avlprop::execute(function*)':
../../gcc/config/riscv/riscv-avlprop.cc:346:23: error: loop variable
'candidate' creates a copy from type 'const std::pair<avlprop_type,
rtl_ssa::insn_info*>' [-Werror=range-loop-construct]
  346 |       for (const auto candidate : m_candidates)
      |                       ^~~~~~~~~
../../gcc/config/riscv/riscv-avlprop.cc:346:23: note: use reference type
to prevent copying
  346 |       for (const auto candidate : m_candidates)
      |                       ^~~~~~~~~
      |                       &

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Use
reference type to prevent copying.

Signed-off-by: Pan Li <pan2.li@intel.com>
21 months agoDaily bump.
GCC Administrator [Sun, 29 Oct 2023 00:17:16 +0000 (00:17 +0000)] 
Daily bump.

21 months agod: Fix ICE: in verify_gimple_in_seq on powerpc-darwin9 [PR112270]
Iain Buclaw [Sat, 28 Oct 2023 22:27:49 +0000 (00:27 +0200)] 
d: Fix ICE: in verify_gimple_in_seq on powerpc-darwin9 [PR112270]

This ICE was seen during stage2 on powerpc-darwin9 only.  There were
still some uses of GCC's boolean_type_node in the D front-end, which
caused a type mismatch to trigger as D bool size is fixed to 1 byte on
all targets.

So two new nodes have been introduced - d_bool_false_node and
d_bool_true_node - which have replaced all remaining uses of
boolean_false_node and boolean_true_node respectively.

PR d/112270

gcc/d/ChangeLog:

* d-builtins.cc (d_build_d_type_nodes): Initialize d_bool_false_node,
d_bool_true_node.
* d-codegen.cc (build_array_struct_comparison): Use d_bool_false_node
instead of boolean_false_node.
* d-convert.cc (d_truthvalue_conversion): Use d_bool_false_node and
d_bool_true_node instead of boolean_false_node and boolean_true_node.
* d-tree.h (enum d_tree_index): Add DTI_BOOL_FALSE and DTI_BOOL_TRUE.
(d_bool_false_node): New macro.
(d_bool_true_node): New macro.
* modules.cc (build_dso_cdtor_fn): Use d_bool_false_node and
d_bool_true_node instead of boolean_false_node and boolean_true_node.
(register_moduleinfo): Use d_bool_type instead of boolean_type_node.

gcc/testsuite/ChangeLog:

* gdc.dg/pr112270.d: New test.

21 months agod: Add warning for call expression without side effects
Iain Buclaw [Sat, 28 Oct 2023 07:42:15 +0000 (09:42 +0200)] 
d: Add warning for call expression without side effects

In the last merge of the dmd front-end with upstream (r14-4830), this
warning got removed from the semantic passes.  Reimplement the warning
for the code generation pass instead, where it cannot have an effect on
conditional compilation.

gcc/d/ChangeLog:

* d-codegen.cc (call_side_effect_free_p): New function.
* d-tree.h (CALL_EXPR_WARN_IF_UNUSED): New macro.
(call_side_effect_free_p): New prototype.
* expr.cc (ExprVisitor::visit (CallExp *)): Set
CALL_EXPR_WARN_IF_UNUSED on matched call expressions.
(ExprVisitor::visit (NewExp *)): Don't dereference the result of an
allocation call here.
* toir.cc (add_stmt): Emit warning when call expression added to
statement list without being used.

gcc/testsuite/ChangeLog:

* gdc.dg/Wunused_value.d: New test.

21 months agoDaily bump.
GCC Administrator [Sat, 28 Oct 2023 00:16:37 +0000 (00:16 +0000)] 
Daily bump.

21 months ago[RA]: Fixing i686 bootstrap failure because of pushing the equivalence patch
Vladimir N. Makarov [Fri, 27 Oct 2023 18:50:40 +0000 (14:50 -0400)] 
[RA]: Fixing i686 bootstrap failure because of pushing the equivalence patch

GCC with my recent patch improving cost calculation for pseudos with
equivalence may generate different code with and without debug info
and as the result i686 bootstrap fails on i686.  The patch fixes this
bug.

gcc/ChangeLog:

PR rtl-optimization/112107
* ira-costs.cc: (calculate_equiv_gains): Use NONDEBUG_INSN_P
instead of INSN_P.

21 months agoRISC-V: Make stack_save_restore_2 more robust
Patrick O'Neill [Fri, 27 Oct 2023 17:50:28 +0000 (10:50 -0700)] 
RISC-V: Make stack_save_restore_2 more robust

GCC recently changed to emit __riscv_restore_5 which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any number after __riscv_save_ and __riscv_restore_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/stack_save_restore_2.c: Accept any number
after __riscv_save_ and __riscv_restore_.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
21 months agoPR modula2/112110: fails to build on freebsd when compiling wrapclock.cc
Gaius Mulley [Fri, 27 Oct 2023 17:42:09 +0000 (18:42 +0100)] 
PR modula2/112110: fails to build on freebsd when compiling wrapclock.cc

This patch fixes a mangled #if #endif conditional section within
wrapclock.cc.  The conditional section in wrapclock_timezone
should return 0 rather than return timezone.

libgm2/ChangeLog:

PR modula2/112110
* libm2iso/wrapclock.cc (timezone): Return 0 if unable
to get the timezone from the tm struct.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
21 months agoFortran: diagnostics of MODULE PROCEDURE declaration conflicts [PR104649]
Harald Anlauf [Thu, 26 Oct 2023 20:32:35 +0000 (22:32 +0200)] 
Fortran: diagnostics of MODULE PROCEDURE declaration conflicts [PR104649]

gcc/fortran/ChangeLog:

PR fortran/104649
* decl.cc (gfc_match_formal_arglist): Handle conflicting declarations
of a MODULE PROCEDURE when one of the declarations is an alternate
return.

gcc/testsuite/ChangeLog:

PR fortran/104649
* gfortran.dg/pr104649.f90: New test.

Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
21 months agoamdgcn: Fix bug in gfx1030 support patch
Andrew Stubbs [Fri, 27 Oct 2023 16:53:10 +0000 (17:53 +0100)] 
amdgcn: Fix bug in gfx1030 support patch

The previous patch to add gfx1030 support introduced an issue with passing
exit codes from kernels run under gcn-run (offload kernels were unaffected).

gcc/ChangeLog:

PR target/112088
* config/gcn/gcn.cc (gcn_expand_epilogue): Fix kernel epilogue register
conflict.

21 months agoamdgcn: silence warnings
Andrew Stubbs [Fri, 27 Oct 2023 10:37:07 +0000 (11:37 +0100)] 
amdgcn: silence warnings

The operands really should be VOIDmode, so the warnings are false.

gcc/ChangeLog:

* config/gcn/gcn-valu.md
(vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): Mention "operands" in
condition to silence the warnings.
(vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): Likewise.
* config/gcn/gcn.md (*movti_insn): Likewise.

21 months agorecog: Fix propagation into ASM_OPERANDS
Richard Sandiford [Fri, 27 Oct 2023 15:37:11 +0000 (16:37 +0100)] 
recog: Fix propagation into ASM_OPERANDS

An inline asm with multiple output operands is represented as a
parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
insn_propagation didn't account for this, and instead propagated
into each ASM_OPERANDS individually.  This meant that it could
apply a substitution X->Y to Y itself, which (a) could create
circularity and (b) would be semantically wrong in any case,
since Y might use a different value of X.

This patch checks explicitly for parallels involving ASM_OPERANDS,
just like combine does.

gcc/
* recog.cc (insn_propagation::apply_to_pattern_1): Handle shared
ASM_OPERANDS.

21 months agoc++: another build_new_1 folding fix [PR111929]
Patrick Palka [Fri, 27 Oct 2023 15:31:02 +0000 (11:31 -0400)] 
c++: another build_new_1 folding fix [PR111929]

In build_new_1, we also need to avoid folding 'outer_nelts_check' when
in a template context to prevent an ICE on the below testcase.  This
patch replaces the problematic fold_build2 call with build2 (we'll later
fold it if appropriate during cp_fully_fold).

In passing, this patch removes an unnecessary conversion of 'nelts'
since it should always already be a size_t (and 'convert' isn't the best
conversion entry point to use anyway since it lacks a complain parameter).

PR c++/111929

gcc/cp/ChangeLog:

* init.cc (build_new_1): Remove unnecessary call to convert
on 'nelts'.  Use build2 instead of fold_build2 for
'outer_nelts_checks'.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent28a.C: New test.

21 months agoc++: add testcase verifying non-dep new-expr checking
Patrick Palka [Fri, 27 Oct 2023 15:26:40 +0000 (11:26 -0400)] 
c++: add testcase verifying non-dep new-expr checking

gcc/testsuite/ChangeLog:

* g++.dg/template/new14.C: New test.

21 months agoc++: more ahead-of-time -Wparentheses warnings
Patrick Palka [Fri, 27 Oct 2023 15:14:04 +0000 (11:14 -0400)] 
c++: more ahead-of-time -Wparentheses warnings

Now that we don't have to worry about looking through NON_DEPENDENT_EXPR,
we can easily extend the -Wparentheses warning in convert_for_assignment
to consider (non-dependent) templated assignment operator expressions as
well, like r14-4111-g6e92a6a2a72d3b did in maybe_convert_cond.

gcc/cp/ChangeLog:

* cp-tree.h (maybe_warn_unparenthesized_assignment): Declare.
* semantics.cc (is_assignment_op_expr_p): Generalize to return
true for any assignment operator expression, not just one that
has been resolved to an operator overload.
(maybe_warn_unparenthesized_assignment): Factored out from ...
(maybe_convert_cond): ... here.
(finish_parenthesized_expr): Mention
maybe_warn_unparenthesized_assignment.
* typeck.cc (convert_for_assignment): Replace -Wparentheses
warning logic with maybe_warn_unparenthesized_assignment.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-13.C: Strengthen by expecting that
we issue the -Wparentheses warnings ahead of time.
* g++.dg/warn/Wparentheses-23.C: Likewise.
* g++.dg/warn/Wparentheses-32.C: Remove xfails.

21 months agoPR modula2/111530: Build failure on BSD due to getopt_long_only GNU extension dependency
Gaius Mulley [Fri, 27 Oct 2023 14:54:48 +0000 (15:54 +0100)] 
PR modula2/111530: Build failure on BSD due to getopt_long_only GNU extension dependency

This patch uses the libiberty getopt long functions (wrapped up inside
libgm2/libm2pim/cgetopt.cc) and only enables this implementation if
libgm2/configure.ac detects no getopt_long and friends on the target.

gcc/m2/ChangeLog:

PR modula2/111530
* gm2-libs-ch/cgetopt.c (cgetopt_cgetopt_long): Re-format.
(cgetopt_cgetopt_long_only): Re-format.
(cgetopt_SetOption):  Re-format and assign flag to NULL
if name is also NULL.
* gm2-libs/GetOpt.def (AddLongOption): Add index parameter
and change flag to be a VAR parameter rather than a pointer.
(GetOptLong): Re-format.
(GetOpt): Correct comment.
* gm2-libs/GetOpt.mod: Re-write to rely on cgetopt rather
than implement long option creation in GetOpt.
* gm2-libs/cgetopt.def (SetOption): has_arg type is INTEGER.

libgm2/ChangeLog:

PR modula2/111530
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (AC_CHECK_HEADERS): Include getopt.h.
(GM2_CHECK_LIB): getopt_long check.
(GM2_CHECK_LIB): getopt_long_only check.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.in: Regenerate.
* libm2pim/cgetopt.cc: Re-write using conditional on configure
and long function code from libiberty/getopt.c.

gcc/testsuite/ChangeLog:

PR modula2/111530
* gm2/pimlib/run/pass/testgetopt.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
21 months ago[PATCH] RISC-V: Fix wrong tune parameters on int_div
Yangyu Chen [Fri, 27 Oct 2023 14:39:26 +0000 (08:39 -0600)] 
[PATCH] RISC-V: Fix wrong tune parameters on int_div

This patch fixes an issue with the cost on "int_div" in various RISC-V
tune parameters including those for Rocket, SiFive U7 series, and T-Head
C906. This incorrect cost value interferes with the optimization process.
For example, it prevents the optimization of division by a constant to a
more efficient method known as Barrett reduction. This lack of
optimization negatively affects the performance of these systems.

The integer div cost of the Rocket and SiFive U7 is taken from the
Rocket-Chip Divider source code[1] with BigCore configuration[2]. It shows
the divUnroll unchanged which is 1 by default. Thus, the maximum int_div
cycles should be the dataWidth + 1, which is 33 for 32-bit and 65 for
64-bit.

As for C906, the divider takes 2 cycle to start[3], and it produce 2-bit
result each cycle[4]. Thus, the maximum int_div cycles should be the
dataWidth / 2 + 2, which is 18 for 32-bit and 34 for 64-bit.

I also test the performance on VisionFive2 which has Qual-Core Sifive U74.
I write a simple C program to do 1e8 times div by constant 6 in int32. The
result shows it takes 1.998s using div, and 0.420s using barrett reduction
to replace div with mul, which is 4.75x faster.

[1] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/rocket/Multiplier.scala#L40
[2] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/subsystem/Configs.scala#L97
[3] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div.v#L267
[4] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div_shift2_kernel.v#L93

gcc/ChangeLog:

* config/riscv/riscv.cc (rocket_tune_info): Fix int_div cost.
(sifive_7_tune_info, thead_c906_tune_info): Likewise.

21 months agoRISC-V: Add rawmemchr expander.
Robin Dapp [Tue, 24 Oct 2023 08:33:15 +0000 (10:33 +0200)] 
RISC-V: Add rawmemchr expander.

This patch adds a vectorized rawmemchr expander.  It also moves the
vectorized expand_block_move to riscv-string.cc.

gcc/ChangeLog:

* config/riscv/autovec.md (rawmemchr<ANYI:mode>): New expander.
* config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx):
Define.
(expand_rawmemchr): Define.
* config/riscv/riscv-v.cc (force_vector_length_operand): Remove
static.
(expand_block_move): Move from here...
* config/riscv/riscv-string.cc (expand_block_move): ...to here.
(expand_rawmemchr): Add vectorized expander.
* internal-fn.cc (expand_RAWMEMCHR): Fix typo.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-prof/peel-2.c: Add
-fno-tree-loop-distribute-patterns.
* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add builtin directory.
* gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.

21 months agoRISC-V: Fix cond_sqrt tests.
Robin Dapp [Thu, 26 Oct 2023 18:40:00 +0000 (20:40 +0200)] 
RISC-V: Fix cond_sqrt tests.

As long as we do not have universal Zvfh support in binutils
linking against libm does not work out of the box.  This patch
splits the cond_sqrt tests into non-zvfh and zvfh variants and
makes the run-zvfh ones depend on a zvfh target.

While at it, I also added Zvfh handling to the testsuite helpers.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Remove
Float16.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* lib/target-supports.exp: Add zvfh handling.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-2.c: New test.

21 months ago[RA]: Add cost calculation for reg equivalence invariants
Vladimir N. Makarov [Fri, 27 Oct 2023 12:28:24 +0000 (08:28 -0400)] 
[RA]: Add cost calculation for reg equivalence invariants

My recent patch improving cost calculation for pseudos with equivalence
resulted in failure of gcc.target/arm/eliminate.c on aarch64.  This patch
fixes this failure.

gcc/ChangeLog:

* ira-costs.cc: (get_equiv_regno, calculate_equiv_gains):
Process reg equivalence invariants.

21 months agoi386: Fiy typo in "partial_memory_read_stall" tune option.
Uros Bizjak [Fri, 27 Oct 2023 13:33:55 +0000 (15:33 +0200)] 
i386: Fiy typo in "partial_memory_read_stall" tune option.

gcc/ChangeLog:

* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
i386: Fiy typo in "partial_memory_read_stall" tune option.

21 months agoMove OpenMP tests to gomp subdir
Paul-Antoine Arras [Fri, 27 Oct 2023 10:30:26 +0000 (12:30 +0200)] 
Move OpenMP tests to gomp subdir

gcc/testsuite/ChangeLog:

* gfortran.dg/c_ptr_tests_20.f90: Moved to...
* gfortran.dg/gomp/c_ptr_tests_20.f90: ...here.
* gfortran.dg/c_ptr_tests_21.f90: Moved to...
* gfortran.dg/gomp/c_ptr_tests_21.f90: ...here.

21 months agoaarch64: Add basic target_print_operand support for CONST_STRING
Victor Do Nascimento [Fri, 7 Jul 2023 12:08:45 +0000 (13:08 +0100)] 
aarch64: Add basic target_print_operand support for CONST_STRING

Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.

Consequently, an rtx such as:

  (set (reg/i:DI 0 x0)
         (unspec:DI [(const_string ("s3_3_c13_c2_2"))])

can now be output correctly using the following output pattern when
composing `define_insn's:

  "mrs\t%x0, %1"

gcc/ChangeLog

* config/aarch64/aarch64.cc (aarch64_print_operand): Add
support for CONST_STRING.

21 months agoPR target/110551: Fix reg allocation for widening multiplications on x86.
Roger Sayle [Fri, 27 Oct 2023 09:03:53 +0000 (10:03 +0100)] 
PR target/110551: Fix reg allocation for widening multiplications on x86.

This patch contains clean-ups of the widening multiplication patterns in
i386.md, and provides variants of the existing highpart multiplication
peephole2 transformations (that tidy up register allocation after
reload), and thereby fixes PR target/110551, which is a superfluous
move instruction.

For the new test case, compiled on x86_64 with -O2.

Before:
mulx64: movabsq $-7046029254386353131, %rcx
        movq    %rcx, %rax
        mulq    %rdi
        xorq    %rdx, %rax
        ret

After:
mulx64: movabsq $-7046029254386353131, %rax
        mulq    %rdi
        xorq    %rdx, %rax
        ret

The clean-ups are (i) that operand 1 is consistently made register_operand
and operand 2 becomes nonimmediate_operand, so that predicates match the
constraints, (ii) the representation of the BMI2 mulx instruction is
updated to use the new umul_highpart RTX, and (iii) because operands
0 and 1 have different modes in widening multiplications, "a" is a more
appropriate constraint than "0" (which avoids spills/reloads containing
SUBREGs).  The new peephole2 transformations are based upon those at
around line 9951 of i386.md, that begins with the comment
;; Highpart multiplication peephole2s to tweak register allocation.
;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq %rdi

2023-10-27  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/110551
* config/i386/i386.md (<u>mul<mode><dwi>3): Make operands 1 and
2 take "regiser_operand" and "nonimmediate_operand" respectively.
(<u>mulqihi3): Likewise.
(*bmi2_umul<mode><dwi>3_1): Operand 2 needs to be register_operand
matching the %d constraint.  Use umul_highpart RTX to represent
the highpart multiplication.
(*umul<mode><dwi>3_1):  Operand 2 should use regiser_operand
predicate, and "a" rather than "0" as operands 0 and 2 have
different modes.
(define_split): For mul to mulx conversion, use the new
umul_highpart RTX representation.
(*mul<mode><dwi>3_1):  Operand 1 should be register_operand
and the constraint %a as operands 0 and 1 have different modes.
(*<u>mulqihi3_1): Operand 1 should be register_operand matching
the constraint %0.
(define_peephole2): Providing widening multiplication variants
of the peephole2s that tweak highpart multiplication register
allocation.

gcc/testsuite/ChangeLog
PR target/110551
* gcc.target/i386/pr110551.c: New test case.

21 months agopreprocessor: c++: Support `#pragma GCC target' macros [PR87299]
Lewis Hyatt [Fri, 27 Oct 2023 08:32:50 +0000 (04:32 -0400)] 
preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

`#pragma GCC target' is not currently handled in preprocess-only mode (e.g.,
when running gcc -E or gcc -save-temps). As noted in the PR, this means that
if the target pragma defines any macros, those macros are not effective in
preprocess-only mode. Similarly, such macros are not effective when
compiling with C++ (even when compiling without -save-temps), because C++
does not process the pragma until after all tokens have been obtained from
libcpp, at which point it is too late for macro expansion to take place.

Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas
under these conditions as well, so resolve the PR by using the new "early
pragma" support.

toplev.cc required some changes because the target-specific handlers for
`#pragma GCC target' may call target_reinit(), and toplev.cc was not expecting
that function to be called in preprocess-only mode.

I added some additional testcases from the PR for x86. The other targets
that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
already had tests verifying that the pragma sets macros as expected; here I
have added -save-temps versions of some of them, to test that they now work
in preprocess-only mode as well.

gcc/c-family/ChangeLog:

PR preprocessor/87299
* c-pragma.cc (init_pragma): Register `#pragma GCC target' and
related pragmas in preprocess-only mode, and enable early handling.
(c_reset_target_pragmas): New function refactoring code from...
(handle_pragma_reset_options): ...here.
* c-pragma.h (c_reset_target_pragmas): Declare.

gcc/cp/ChangeLog:

PR preprocessor/87299
* parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas ()
after preprocessing is complete, before starting compilation.

gcc/ChangeLog:

PR preprocessor/87299
* toplev.cc (no_backend): New static global.
(finalize): Remove argument no_backend, which is now a
static global.
(process_options): Likewise.
(do_compile): Likewise.
(target_reinit): Don't do anything in preprocess-only mode.
(toplev::main): Adapt to no_backend change.
(toplev::finalize): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/87299
* c-c++-common/pragma-target-1.c: New test.
* c-c++-common/pragma-target-2.c: New test.
* g++.target/i386/pr87299-1.C: New test.
* g++.target/i386/pr87299-2.C: New test.
* gcc.target/i386/pr87299-1.c: New test.
* gcc.target/i386/pr87299-2.c: New test.
* gcc.target/s390/target-attribute/tattr-2b.c: New test.
* gcc.target/aarch64/pragma_cpp_predefs_1b.c: New test.
* gcc.target/arm/pragma_arch_attribute_1b.c: New test.
* gcc.target/nios2/custom-fp-2b.c: New test.
* gcc.target/powerpc/float128-3b.c: New test.

21 months agoFortran: Fix some problems with SELECT TYPE selectors [PR104625].
Paul Thomas [Fri, 27 Oct 2023 08:33:38 +0000 (09:33 +0100)] 
Fortran: Fix some problems with SELECT TYPE selectors [PR104625].

2023-10-27  Paul Thomas  <pault@gcc.gnu.org>

gcc/fortran
PR fortran/104625
* expr.cc (gfc_check_vardef_context): Check that the target
does have a vector index before emitting the specific error.
* match.cc (copy_ts_from_selector_to_associate): Ensure that
class valued operator expressions set the selector rank and
use the rank to provide the associate variable with an
appropriate array spec.
* resolve.cc (resolve_operator): Reduce stacked parentheses to
a single pair.
(fixup_array_ref): Extract selector symbol from parentheses.

gcc/testsuite/
PR fortran/104625
* gfortran.dg/pr104625.f90: New test.
* gfortran.dg/associate_55.f90: Change error check.

21 months agoMATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]
Andrew Pinski [Wed, 13 Sep 2023 01:24:22 +0000 (18:24 -0700)] 
MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]

I noticed we were missing these simplifications so let's add them.

This adds the following simplifications:
U & N <= U  -> true
U & N >  U  -> false
When U is known to be as non-negative.

When N is also known to be non-negative, this is also true:
U | N <  U  -> false
U | N >= U  -> true

When N is a negative integer, the result flips and we get:
U | N <  U  -> true
U | N >= U  -> false

We could extend this later on to be the case where we know N
is nonconstant but is known to be negative.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/101590
PR tree-optimization/94884

gcc/ChangeLog:

* match.pd (`(X BIT_OP Y) CMP X`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitcmp-1.c: New test.
* gcc.dg/tree-ssa/bitcmp-2.c: New test.
* gcc.dg/tree-ssa/bitcmp-3.c: New test.
* gcc.dg/tree-ssa/bitcmp-4.c: New test.
* gcc.dg/tree-ssa/bitcmp-5.c: New test.
* gcc.dg/tree-ssa/bitcmp-6.c: New test.

21 months agoSupport vec_cmpmn/vcondmn for v2hf/v4hf.
liuhongt [Mon, 23 Oct 2023 05:40:10 +0000 (13:40 +0800)] 
Support vec_cmpmn/vcondmn for v2hf/v4hf.

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle
V2HF/V2BF/V4HF/V4BFmode.
* config/i386/i386.cc (ix86_get_mask_mode): Return QImode when
data_mode is V4HF/V2HFmode.
* config/i386/mmx.md (vec_cmpv4hfqi): New expander.
(vcond_mask_<mode>v4hi): Ditto.
(vcond_mask_<mode>qi): Ditto.
(vec_cmpv2hfqi): Ditto.
(vcond_mask_<mode>v2hi): Ditto.
(mmx_plendvb_<mode>): Add 2 combine splitters after the
patterns.
(mmx_pblendvb_v8qi): Ditto.
(<code>v2hi3): Add a combine splitter after the pattern.
(<code><mode>3): Ditto.
(<code>v8qi3): Ditto.
(<code><mode>3): Ditto.
* config/i386/sse.md (vcond<mode><mode>): Merge this with ..
(vcond<sseintvecmodelower><mode>): .. this into ..
(vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this,
and extend to V8BF/V16BF/V32BFmode.

gcc/testsuite/ChangeLog:

* g++.target/i386/part-vect-vcondhf.C: New test.
* gcc.target/i386/part-vect-vec_cmphf.c: New test.

21 months agoDaily bump.
GCC Administrator [Fri, 27 Oct 2023 00:17:12 +0000 (00:17 +0000)] 
Daily bump.

21 months agoRISC-V: Move lmul calculation into macro
Juzhe-Zhong [Thu, 26 Oct 2023 22:28:56 +0000 (06:28 +0800)] 
RISC-V: Move lmul calculation into macro

Notice we calculate LMUL according to --param=riscv-autovec-lmul
in multiple places: int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;

Create a new macro for it for easier matain.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_MAX_LMUL): New macro.
* config/riscv/riscv-v.cc (preferred_simd_mode): Adapt macro.
(autovectorize_vector_modes): Ditto.
(can_find_related_mode_p): Ditto.

21 months agoRISC-V: Add AVL propagation PASS for RVV auto-vectorization
Juzhe-Zhong [Thu, 26 Oct 2023 08:13:51 +0000 (16:13 +0800)] 
RISC-V: Add AVL propagation PASS for RVV auto-vectorization

This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.

Consider a simple vector addition operation:

https://godbolt.org/z/7hfGfEjW3

void
foo (int *__restrict a,
     int *__restrict b,
     int *__restrict n)
{
  for (int i = 0; i < n; i++)
      a[i] = a[i] + b[i];
}

Optimized IR:

Loop body:
  _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]);                          -> vsetvli a5,a2,e8,mf4,ta,ma
  ...
  vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0);    -> vle32.v v2,0(a0)
  vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0);   -> vle32.v v1,0(a1)
  vect__7.12_19 = vect__6.11_20 + vect__4.8_27;                              -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
  .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19);  -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)

We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:

vect__7.12_19 = vect__6.11_20 + vect__4.8_27;

GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):

ARM SVE:

.L3:
        ld1w    z30.s, p7/z, [x0, x3, lsl 2]   -> predicated load
        ld1w    z31.s, p7/z, [x1, x3, lsl 2]   -> predicated load
        add     z31.s, z31.s, z30.s            -> un-predicated add
        st1w    z31.s, p7, [x0, x3, lsl 2]     -> predicated store

Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.

Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:

1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
   We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.

To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.

The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)

Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.

The reasons as follows:

1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
   turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
 PASS become heavy and heavy again, then we will need to refactor it again in the future.
 Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
 fixes.

2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things,  I don't think we should fuse them into same PASS.

3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.

4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
   This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
 We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
 VSETVL PASS again which is already so complicated.)

Here is an example to demonstrate more:

https://godbolt.org/z/bE86sv3q5

void foo2 (int *__restrict a,
          int *__restrict b,
          int *__restrict c,
          int *__restrict a2,
          int *__restrict b2,
          int *__restrict c2,
          int *__restrict a3,
          int *__restrict b3,
          int *__restrict c3,
          int *__restrict a4,
          int *__restrict b4,
          int *__restrict c4,
          int *__restrict a5,
          int *__restrict b5,
          int *__restrict c5,
          int n)
{
    for (int i = 0; i < n; i++){
      a[i] = b[i] + c[i];
      b5[i] = b[i] + c[i];
      a2[i] = b2[i] + c2[i];
      a3[i] = b3[i] + c3[i];
      a4[i] = b4[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a5[i] + b5[i]+ a[i];

      a[i] = a[i] + c[i];
      b5[i] = a[i] + c[i];
      a2[i] = a[i] + c2[i];
      a3[i] = a[i] + c3[i];
      a4[i] = a[i] + c4[i];
      a5[i] = a[i] + a4[i];
      a[i] = a[i] + b5[i]+ a[i];
    }
}

1. Loop Body:

Before this patch:                                          After this patch:

      vsetvli a4,t1,e8,mf4,ta,ma                           vsetvli a4,t1,e32,m1,ta,ma
        vle32.v v2,0(a2)                                     vle32.v v2,0(a2)
        vle32.v v4,0(a1)                                     vle32.v v3,0(t2)
        vle32.v v1,0(t2)                                     vle32.v v4,0(a1)
        vsetvli a7,zero,e32,m1,ta,ma                         vle32.v v1,0(t0)
        vadd.vv v4,v2,v4                                     vadd.vv v4,v2,v4
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v1,v3,v1
        vle32.v v3,0(s0)                                     vadd.vv v1,v1,v4
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v1,v1,v4
        vadd.vv v1,v3,v1                                     vadd.vv v1,v1,v4
        vadd.vv v1,v1,v4                                     vadd.vv v1,v1,v2
        vadd.vv v1,v1,v4                                     vadd.vv v2,v1,v2
        vadd.vv v1,v1,v4                                     vse32.v v2,0(t5)
        vsetvli zero,a4,e32,m1,ta,ma                         vadd.vv v2,v2,v1
        vle32.v v4,0(a5)                                     vadd.vv v2,v2,v1
        vsetvli a7,zero,e32,m1,ta,ma                         slli a7,a4,2
        vadd.vv v1,v1,v2                                     vadd.vv v3,v1,v3
        vadd.vv v2,v1,v2                                     vle32.v v5,0(a5)
        vadd.vv v4,v1,v4                                     vle32.v v6,0(t6)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v3,0(t3)
        vse32.v v2,0(t5)                                     vse32.v v2,0(a0)
        vse32.v v4,0(a3)                                     vadd.vv v3,v3,v1
        vsetvli a7,zero,e32,m1,ta,ma                         vadd.vv v2,v1,v5
        vadd.vv v3,v1,v3                                     vse32.v v3,0(t4)
        vadd.vv v2,v2,v1                                     vadd.vv v1,v1,v6
        vadd.vv v2,v2,v1                                     vse32.v v2,0(a3)
        vsetvli zero,a4,e32,m1,ta,ma                         vse32.v v1,0(a6)
        vse32.v v2,0(a0)
        vse32.v v3,0(t3)
        vle32.v v2,0(t0)
        vsetvli a7,zero,e32,m1,ta,ma
        vadd.vv v3,v3,v1
        vsetvli zero,a4,e32,m1,ta,ma
        vse32.v v3,0(t4)
        vsetvli a7,zero,e32,m1,ta,ma
        slli    a7,a4,2
        vadd.vv v1,v1,v2
        sub     t1,t1,a4
        vsetvli zero,a4,e32,m1,ta,ma
        vse32.v v1,0(a6)

It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.

2. Epilogue:
    Before this patch:                                          After this patch:

     .L5:                                                      .L5:
        ld      s0,8(sp)                                         ret
        addi    sp,sp,16
        jr      ra

This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'

The final codegen after this patch:

foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret

PR target/111318
PR target/111888

gcc/ChangeLog:

* config.gcc: Add AVL propagation pass.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-avlprop.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111318.c: New test.
* gcc.target/riscv/rvv/autovec/pr111888.c: New test.
Tested-by: Patrick O'Neill <patrick@rivosinc.com>
21 months agolibstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089]
Jonathan Wakely [Thu, 26 Oct 2023 15:51:30 +0000 (16:51 +0100)] 
libstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089]

The incorrect errc constant here looks like a copy&paste error.

libstdc++-v3/ChangeLog:

PR libstdc++/112089
* include/std/shared_mutex (shared_lock::unlock): Change errc
constant to operation_not_permitted.
* testsuite/30_threads/shared_lock/locking/112089.cc: New test.

21 months agolibstdc++: Add dg-timeout-factor to <chrono> IO tests
Jonathan Wakely [Tue, 24 Oct 2023 20:50:17 +0000 (21:50 +0100)] 
libstdc++: Add dg-timeout-factor to <chrono> IO tests

This avoids failures due to compilation timeouts when testing with a low
tool_timeout value.

libstdc++-v3/ChangeLog:

* testsuite/20_util/duration/io.cc: Double timeout using
dg-timeout-factor.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/format.cc: Likewise.
* testsuite/std/time/hh_mm_ss/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/month_day/io.cc: Likewise.
* testsuite/std/time/month_day_last/io.cc: Likewise.
* testsuite/std/time/month_weekday/io.cc: Likewise.
* testsuite/std/time/month_weekday_last/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/weekday_indexed/io.cc: Likewise.
* testsuite/std/time/weekday_last/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/time/year_month_day_last/io.cc: Likewise.
* testsuite/std/time/year_month_weekday/io.cc: Likewise.
* testsuite/std/time/year_month_weekday_last/io.cc: Likewise.
* testsuite/std/time/zoned_time/io.cc: Likewise.

21 months agoAdd attribute((null_terminated_string_arg(PARAM_IDX)))
David Malcolm [Thu, 26 Oct 2023 19:56:13 +0000 (15:56 -0400)] 
Add attribute((null_terminated_string_arg(PARAM_IDX)))

This patch adds a new function attribute to GCC for marking that an
argument is expected to be a null-terminated string.

For example, consider:

  void test_a (const char *p)
    __attribute__((null_terminated_string_arg (1)));

which would indicate to humans and compilers that argument 1 of "test_a"
is expected to be a null-terminated string, with the idea:

- we should complain if it's not valid to read from *p up to the first
  '\0' character in the buffer

- we should complain if *p is not terminated, or if it's uninitialized
  before the first '\0' character

This is independent of the nonnull-ness of the pointer: if you also want
to express that the argument must be non-null, we already have
__attribute__((nonnull (N))), so the user can write e.g.:

  void test_b (const char *p)
    __attribute__((null_terminated_string_arg (1))
    __attribute__((nonnull (1)));

which can also be spelled as:

  void test_b (const char *p)
     __attribute__((null_terminated_string_arg (1),
                    nonnull (1)));

For a function similar to strncpy, we can use the "access" attribute to
express a maximum size of the read:

  void test_c (const char *p, size_t sz)
     __attribute__((null_terminated_string_arg (1),
                    nonnull (1),
                    access (read_only, 1, 2)));

The patch implements:
(a) C/C++ frontends: recognition of this attribute
(b) analyzer: usage of this attribute

gcc/analyzer/ChangeLog:
* region-model.cc
(region_model::check_external_function_for_access_attr): Split
out, replacing with...
(region_model::check_function_attr_access): ...this new function
and...
(region_model::check_function_attrs): ...this new function.
(region_model::check_one_function_attr_null_terminated_string_arg):
New.
(region_model::check_function_attr_null_terminated_string_arg):
New.
(region_model::handle_unrecognized_call): Update for renaming of
check_external_function_for_access_attr to check_function_attrs.
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload.  Make both overloads const.
* region-model.h: Include "stringpool.h" and "attribs.h".
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload.  Make both overloads const.
(region_model::check_external_function_for_access_attr): Delete
decl.
(region_model::check_function_attr_access): New decl.
(region_model::check_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_one_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_function_attrs): New decl.

gcc/c-family/ChangeLog:
* c-attribs.cc (c_common_attribute_table): Add
"null_terminated_string_arg".
(handle_null_terminated_string_arg_attribute): New.

gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Add
null_terminated_string_arg.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/attr-null_terminated_string_arg-access-read_write.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-access-without-size.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-multiple.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-2.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable.c:
New test.
* c-c++-common/attr-null_terminated_string_arg.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
21 months agotestsuite, aarch64: Normalise options to aarch64.exp.
Iain Sandoe [Thu, 26 Oct 2023 18:46:16 +0000 (19:46 +0100)] 
testsuite, aarch64: Normalise options to aarch64.exp.

When the compiler is configured --with-cpu= and that is different from
the baselines assumed, we see excess tes fails (primarly in body code
scans which are necessarily sensitive to costs).  To stabilize the
testsuite against such changes, use aarch64-with-arch-dg-options ()
to provide suitable consistent defaults.

e.g. for --with-cpu=xgene1 we see over 100 excess fails which are
removed by this change.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/aarch64.exp: Use aarch64-with-arch-dg-options
to normaize the options to the tests in aarch64.exp.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
21 months agotestsuite, Darwin: Adjust target test for modern OS.
Iain Sandoe [Thu, 26 Oct 2023 19:10:01 +0000 (20:10 +0100)] 
testsuite, Darwin: Adjust target test for modern OS.

The same conditions on use of DYLD_LIBRARY_PATH apply to OS versions
11 to 14, so make the test general.

gcc/testsuite/ChangeLog:

* lib/target-libpath.exp: Skip DYLD_LIBRARY_PATH for all
current OS versions > 10.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
21 months agomatch: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]
Andrew Pinski [Tue, 24 Oct 2023 23:13:18 +0000 (23:13 +0000)] 
match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]

This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead
of abs.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111957

gcc/ChangeLog:

* match.pd (`a != C1 ? abs(a) : C2`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-40.c: New test.

21 months agoAdd effective target to OpenMP tests
Paul-Antoine Arras [Thu, 26 Oct 2023 16:51:59 +0000 (18:51 +0200)] 
Add effective target to OpenMP tests

This adds an effective target DejaGnu directive to prevent these testcases from
failing on GCC configurations that do not support OpenMP.
This fixes 8d2130a4e5c.

gcc/testsuite/ChangeLog:

* gfortran.dg/c_ptr_tests_20.f90: Add "fopenmp" effective target.
* gfortran.dg/c_ptr_tests_21.f90: Add "fopenmp" effective target.

21 months ago[range-op] Remove unused variable in fold_range.
Aldy Hernandez [Thu, 26 Oct 2023 16:56:20 +0000 (12:56 -0400)] 
[range-op] Remove unused variable in fold_range.

gcc/ChangeLog:

* range-op-float.cc (range_operator::fold_range): Delete unused
variable.

21 months ago[range-ops] Remove unneeded parameters from rv_fold.
Aldy Hernandez [Thu, 26 Oct 2023 14:00:16 +0000 (10:00 -0400)] 
[range-ops] Remove unneeded parameters from rv_fold.

Now that the floating point version of rv_fold calculates its result
in an frange, we can remove the superfluous LB, UB, and MAYBE_NAN
arguments.

gcc/ChangeLog:

* range-op-float.cc (range_operator::fold_range): Remove
superfluous code.
(range_operator::rv_fold): Remove unneeded arguments.
(operator_plus::rv_fold): Same.
(operator_minus::rv_fold): Same.
(operator_mult::rv_fold): Same.
(operator_div::rv_fold): Same.
* range-op-mixed.h: Remove lb, ub, and maybe_nan arguments from
rv_fold methods.
* range-op.h: Same.

21 months ago[range-ops] Add frange& argument to rv_fold.
Aldy Hernandez [Sun, 1 Oct 2023 20:54:27 +0000 (16:54 -0400)] 
[range-ops] Add frange& argument to rv_fold.

The floating point version of rv_fold returns its result in 3 pieces:
the lower bound, the upper bound, and a maybe_nan bit.  It is cleaner
to return everything in an frange, thus bringing the floating point
version of rv_fold in line with the integer version.

This first patch adds an frange argument, while keeping the current
functionality, and asserting that we get the same results.  In a
follow-up patch I will nuke the now useless 3 arguments.  Splitting
this into two patches makes it easier to bisect any problems if any
should arise.

gcc/ChangeLog:

* range-op-float.cc (range_operator::fold_range): Pass frange
argument to rv_fold.
(range_operator::rv_fold): Add frange argument.
(operator_plus::rv_fold): Same.
(operator_minus::rv_fold): Same.
(operator_mult::rv_fold): Same.
(operator_div::rv_fold): Same.
* range-op-mixed.h: Add frange argument to rv_fold methods.
* range-op.h: Same.

21 months agoRISC-V: Pass abi to g++ rvv testsuite
Patrick O'Neill [Thu, 26 Oct 2023 00:03:24 +0000 (17:03 -0700)] 
RISC-V: Pass abi to g++ rvv testsuite

On rv32gcv testcases like g++.target/riscv/rvv/base/bug-22.C fail with:
FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors)
Excess errors:
cc1plus: error: ABI requires '-march=rv32'

This patch adds the -mabi argument to g++ rvv tests.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/rvv.exp: Add -mabi argument to CFLAGS.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
21 months agolibatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree...
Thomas Schwinge [Mon, 11 Sep 2023 09:36:31 +0000 (11:36 +0200)] 
libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.

PR testsuite/109951
libatomic/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libatomic.exp (libatomic_init): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
set.
(SYSROOT_CFLAGS_FOR_TARGET): Set.

21 months agolibffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testin...
Thomas Schwinge [Mon, 11 Sep 2023 08:50:00 +0000 (10:50 +0200)] 
libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]

Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.

PR testsuite/109951
libffi/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
<local.exp>: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
set 'SYSROOT_CFLAGS_FOR_TARGET'.
* Makefile.in: Regenerate.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libffi.exp (libffi_target_compile): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.

21 months agotestsuite: Allow general skips/requires in PCH tests
Richard Sandiford [Thu, 26 Oct 2023 15:35:47 +0000 (16:35 +0100)] 
testsuite: Allow general skips/requires in PCH tests

dg-pch.exp handled dg-require-effective-target pch_supported_debug
as a special case, by grepping the source code.  This patch tries
to generalise it to other dg-require-effective-targets, and to
dg-skip-if.

There also seemed to be some errors in check-flags.  It used:

    lappend $args [list <elt>]

which treats the contents of args as a variable name.  I think
it was supposed to be "lappend args" instead.  From the later
code, the element was supposed to be <elt> itself, rather than
a singleton list containing <elt>.

We can also save some time by doing the common early-exit first.

Doing this removes the need to specify the dg-require-effective-target
in both files.  Tested by faking unsupported debug and checking that
the tests were still correctly skipped.

gcc/testsuite/
* lib/target-supports-dg.exp (check-flags): Move default argument
handling further up.  Fix a couple of issues in the lappends.
Avoid frobbing the compiler flags if the return value is already
known to be 1.
* lib/dg-pch.exp (dg-flags-pch): Process the dg-skip-if and
dg-require-effective-target directives to see whether the
assembly test should be skipped.
* gcc.dg/pch/valid-1.c: Remove dg-require-effective-target.
* gcc.dg/pch/valid-1b.c: Likewise.

21 months agoarm: Use deltas for Arm switch tables
Richard Ball [Thu, 26 Oct 2023 15:18:50 +0000 (16:18 +0100)] 
arm: Use deltas for Arm switch tables

For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.

Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.

New tests have been added and no regressions on arm-none-eabi.

gcc/ChangeLog:

* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.

gcc/testsuite/ChangeLog:

* gcc.target/arm/arm-switchstatement.c: New test.

21 months agoDarwin: Make metadata symbol lables linker-visible for GNU objc.
Iain Sandoe [Sat, 30 Sep 2023 16:15:16 +0000 (17:15 +0100)] 
Darwin: Make metadata symbol lables linker-visible for GNU objc.

Now we have shifted to using the same relocation mechanism as clang for
objective-c typeinfo the static linker needs to have a linker-visible
symbol for metadata names (this is only needed for GNU objective C, for
NeXT the names are in separate sections).

gcc/ChangeLog:

* config/darwin.h
(darwin_label_is_anonymous_local_objc_name): Make metadata names
linker-visibile for GNU objective C.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
21 months ago[RA]: Modfify cost calculation for dealing with equivalences
Vladimir N. Makarov [Thu, 26 Oct 2023 13:50:40 +0000 (09:50 -0400)] 
[RA]: Modfify cost calculation for dealing with equivalences

RISCV target developers reported that pseudos with equivalence used in
a loop can be spilled.  Simple changes of heuristics of cost
calculation of pseudos with equivalence or even ignoring equivalences
resulted in numerous testsuite failures on different targets or worse
spec2017 performance.  This patch implements more sophisticated cost
calculations of pseudos with equivalences.  The patch does not change
RA behaviour for targets still using the old reload pass instead of
LRA.  The patch solves the reported problem and improves x86-64
specint2017 a bit (specfp2017 performance stays the same).  The patch
takes into account how the equivalence will be used: will it be
integrated into the user insns or require an input reload insn.  It
requires additional pass over insns.  To compensate RA slow down, the
patch removes a pass over insns in the reload pass used by IRA before.
This also decouples IRA from reload more and will help to remove the
reload pass in the future if it ever happens.

gcc/ChangeLog:

* dwarf2out.cc (reg_loc_descriptor): Use lra_eliminate_regs when
LRA is used.
* ira-costs.cc: Include regset.h.
(equiv_can_be_consumed_p, get_equiv_regno, calculate_equiv_gains):
New functions.
(find_costs_and_classes): Call calculate_equiv_gains and redefine
mem_cost of pseudos with equivs when LRA is used.
* var-tracking.cc: Include ira.h and lra.h.
(vt_initialize): Use lra_eliminate_regs when LRA is used.

21 months agoFortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)
Paul-Antoine Arras [Fri, 20 Oct 2023 10:42:49 +0000 (12:42 +0200)] 
Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)

In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.

2023-10-20  Paul-Antoine Arras  <pa@codesourcery.com>
    Tobias Burnus  <tobias@codesourcery.com>

gcc/fortran/ChangeLog:

* interface.cc (gfc_compare_types): Return true if one type is C_PTR
and the other is a compatible INTEGER(8).
* misc.cc (gfc_typename): Handle the case where an INTEGER(8) actually
holds a TYPE(C_PTR).

gcc/testsuite/ChangeLog:

* gfortran.dg/c_ptr_tests_20.f90: New test, checking that INTEGER(8)
and TYPE(C_PTR) are recognised as compatible.
* gfortran.dg/c_ptr_tests_21.f90: New test, exercising the error
detection for C_FUNPTR.

21 months agoDOC: Update COND_LEN document
Juzhe-Zhong [Thu, 26 Oct 2023 03:42:20 +0000 (11:42 +0800)] 
DOC: Update COND_LEN document

gcc/ChangeLog:

* doc/md.texi: Adapt COND_LEN pseudo code.

21 months agoPR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
Roger Sayle [Thu, 26 Oct 2023 09:06:59 +0000 (10:06 +0100)] 
PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo: AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo: MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
PR rtl-optimization/91865
* combine.cc (make_compound_operation): Avoid creating a
ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
PR rtl-optimization/91865
* gcc.target/msp430/pr91865.c: New test case.

21 months agoPass type of comparison operands instead of comparison result to truth_type_for in...
liuhongt [Wed, 25 Oct 2023 06:36:57 +0000 (14:36 +0800)] 
Pass type of comparison operands instead of comparison result to truth_type_for in build_vec_cmp.

gcc/c/ChangeLog:

* c-typeck.cc (build_vec_cmp): Pass type of arg0 to
truth_type_for.

gcc/cp/ChangeLog:

* typeck.cc (build_vec_cmp): Pass type of arg0 to
truth_type_for.

21 months agoLoongArch:Enable vcond_mask_mn expanders for SF/DF modes.
Jiahao Xu [Thu, 26 Oct 2023 01:34:32 +0000 (09:34 +0800)] 
LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.

If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.

gcc/ChangeLog:

* config/loongarch/lasx.md (vcond_mask_<ILASX:mode><ILASX:mode>): Change to
(vcond_mask_<mode><mode256_i>): this.
* config/loongarch/lsx.md (vcond_mask_<ILSX:mode><ILSX:mode>): Change to
(vcond_mask_<mode><mode_i>): this.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: New test.

21 months agotestsuite: Fix _BitInt in gcc.misc-tests/godump-1.c
Stefan Schulze Frielinghaus [Thu, 26 Oct 2023 06:41:24 +0000 (08:41 +0200)] 
testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c

Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.

gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not supported on this target
  237 | _BitInt(32) b32_v;
      | ^~~~~~~

Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.

gcc/testsuite/ChangeLog:

* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.

21 months agoMore '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.
Thomas Schwinge [Thu, 7 Sep 2023 20:15:08 +0000 (22:15 +0200)] 
More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.

Per commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6 (Subversion r251048)
"Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or
may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent
of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but
instead on dynamic (run-time) configuration.  There did remain a few instances
where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows
'TARGET_SUPPORTS_ALIASES'.  Change these to 'if (TARGET_SUPPORTS_ALIASES)',
similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.

gcc/
* ipa-icf.cc (sem_item::target_supports_symbol_aliases_p):
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);' before
'return true;'.
* ipa-visibility.cc (function_and_variable_visibility): Change
'#ifdef ASM_OUTPUT_DEF' to 'if (TARGET_SUPPORTS_ALIASES)'.
* varasm.cc (output_constant_pool_contents)
[#ifdef ASM_OUTPUT_DEF]:
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
(do_assemble_alias) [#ifdef ASM_OUTPUT_DEF]:
'if (!TARGET_SUPPORTS_ALIASES)',
'gcc_checking_assert (seen_error ());'.
(assemble_alias): Change '#if !defined (ASM_OUTPUT_DEF)' to
'if (!TARGET_SUPPORTS_ALIASES)'.
(default_asm_output_anchor):
'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.

21 months agoset hardcmp eh probs
Alexandre Oliva [Thu, 26 Oct 2023 06:06:09 +0000 (03:06 -0300)] 
set hardcmp eh probs

Set execution count of EH blocks, and probability of EH edges.

for  gcc/ChangeLog

PR tree-optimization/111520
* gimple-harden-conditionals.cc
(pass_harden_compares::execute): Set EH edge probability and
EH block execution count.

for  gcc/testsuite/ChangeLog

PR tree-optimization/111520
* g++.dg/torture/harden-comp-pr111520.cc: New.