git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: Add more tests for RVV floating-point FRM.

Add more test cases include both the asm check and run for RVV FRM.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-10.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-8.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-9.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: New test.

vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS

This patch adjusts the cost handling on VMAT_CONTIGUOUS in
function vectorizable_load. We don't call function
vect_model_load_cost for it any more. It removes function
vect_model_load_cost which becomes useless and unreachable
now.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Remove.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without
calling vect_model_load_cost.

vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE

This patch adjusts the cost handling on
VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.

As the affected test case gcc.target/i386/pr70021.c shows,
the previous costing can under-cost the total generated
vector loads as for VMAT_CONTIGUOUS_PERMUTE function
vect_model_load_cost doesn't consider the group size which
is considered as vec_num during the transformation.

This patch makes the count of vector load in costing become
consistent with what we generates during the transformation.
To be more specific, for the given test case, for memory
access b[i_20], it costed for 2 vector loads before,
with this patch it costs 8 instead, it matches the final
count of generated vector loads basing from b. This costing
change makes cost model analysis feel it's not profitable
to vectorize the first loop, so this patch adjusts the test
case without vect cost model any more.

But note that this test case also exposes something we can
improve further is that although the number of vector
permutation what we costed and generated are consistent,
but DCE can further optimize some unused permutation out,
it would be good if we can predict that and generate only
those necessary permutations.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
handle memory_access_type VMAT_CONTIGUOUS, remove some
VMAT_CONTIGUOUS_PERMUTE related handlings.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
without calling vect_model_load_cost.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.

vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE

This patch adjusts the cost handling on
VMAT_CONTIGUOUS_REVERSE in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.

This change makes us not miscount some required vector
permutation as the associated test case shows.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get
VMAT_CONTIGUOUS_REVERSE any more.
(vectorizable_load): Adjust the costing handling on
VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test.

vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES

This patch adjusts the cost handling on
VMAT_LOAD_STORE_LANES in function vectorizable_load. We
don't call function vect_model_load_cost for it any more.

It follows what we do in the function vect_model_load_cost,
and shouldn't have any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_LOAD_STORE_LANES without calling vect_model_load_cost.
(vectorizable_load): Remove VMAT_LOAD_STORE_LANES related handling and
assert it will never get VMAT_LOAD_STORE_LANES.

vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER

This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_load. We don't call function
vect_model_load_cost for it any more.

It's mainly for gather loads with IFN or emulated gather
loads, it follows the handlings in function
vect_model_load_cost. This patch shouldn't have any
functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_GATHER_SCATTER without calling vect_model_load_cost.
(vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER,
remove VMAT_GATHER_SCATTER related handlings and the related parameter
gs_info.

vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_load.  We
don't call function vect_model_load_cost for them any more.

As PR82255 shows, we don't always need a vector construction
there, moving costing next to the transform can make us only
cost for vector construction when it's actually needed.
Besides, it can count the number of loads consistently for
some cases.

PR tree-optimization/82255

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling
on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling
vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP any more, and remove their related handlings.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test.

2023-06-13  Bill Schmidt  <wschmidt@linux.ibm.com>
    Kewen Lin  <linkw@linux.ibm.com>

vect: Adjust vectorizable_load costing on VMAT_INVARIANT

This patch adjusts the cost handling on VMAT_INVARIANT in
function vectorizable_load. We don't call function
vect_model_load_cost for it any more.

To make the costing on VMAT_INVARIANT better, this patch is
to query hoist_defs_of_uses for hoisting decision, and add
costs for different "where" based on it. Currently function
hoist_defs_of_uses would always hoist the defs of all SSA
uses, adding one argument HOIST_P aims to avoid the actual
hoisting during costing phase.

gcc/ChangeLog:

* tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
(vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
hoisting decision and without calling vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
and remove VMAT_INVARIANT related handlings.

vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl

This patch adds one extra argument cost_vec to function
vect_build_gather_load_calls, so that we can do costing
next to the tranform in vect_build_gather_load_calls.
For now, the implementation just follows the handlings in
vect_model_load_cost, it isn't so good, so placing one
FIXME for any further improvement. This patch should not
cause any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_build_gather_load_calls): Add the handlings
on costing with one extra argument cost_vec.
(vectorizable_load): Adjust the call to vect_build_gather_load_calls.
(vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with
gs_info.decl set any more.

vect: Move vect_model_load_cost next to the transform in vectorizable_load

This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_load_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types, hope it can make the
subsequent patches easy to review. This patch should not
have any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Move and duplicate the call
to vect_model_load_cost down to some different transform paths
according to the handlings of different vect_memory_access_types.

tree: Hide wi::from_mpz from GENERATOR_FILE

Similar to r0-85707-g34917a102a4e0c for PR35051, the uses
of mpz_t should be guarded with "#ifndef GENERATOR_FILE".
This patch is to fix it and avoid some possible build
errors.

gcc/ChangeLog:

* tree.h (wi::from_mpz): Hide from GENERATOR_FILE.

mklog: Add --append option to auto add generate ChangeLog to patch file

This tiny patch add --append option to mklog.py that support add generated
change-log to the corresponding patch file. With this option there is no need
to manually copy the generated change-log to the patch file. e.g.:

Run `mklog.py --append /path/to/this/patch` will add the generated change-log
to the right place of the /path/to/this/patch file.

contrib/ChangeLog:

* mklog.py: Add --append option.

Signed-off-by: Lehua Ding <lehua.ding@rivai.ai>

RISC-V: RISC-V: Support gather_load/scatter RVV auto-vectorization

This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.

All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask (Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
3. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
4. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.

Base on the disscussions with Richards, we don't lower vlse/vsse in RISC-V backend for strided load/store.
Instead, we leave it to the middle-end to handle that.

Regression is pass ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md
(len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>): New pattern.
(len_mask_gather_load<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
(len_mask_gather_load<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
(len_mask_gather_load<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
(len_mask_gather_load<mode><mode>): Ditto.
(len_mask_scatter_store<VNX1_QHSD:mode><VNX1_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX2_QHSD:mode><VNX2_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX4_QHSD:mode><VNX4_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX8_QHSD:mode><VNX8_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.
(len_mask_scatter_store<VNX32_QHS:mode><VNX32_QHSI:mode>): Ditto.
(len_mask_scatter_store<VNX64_QH:mode><VNX64_QHI:mode>): Ditto.
(len_mask_scatter_store<mode><mode>): Ditto.
* config/riscv/predicates.md (const_1_operand): New predicate.
(vector_gs_scale_operand_16): Ditto.
(vector_gs_scale_operand_32): Ditto.
(vector_gs_scale_operand_64): Ditto.
(vector_gs_extension_operand): Ditto.
(vector_gs_scale_operand_16_rv32): Ditto.
(vector_gs_scale_operand_32_rv32): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
(expand_gather_scatter): New function.
* config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
(emit_vlmax_masked_store_insn): New function.
(emit_nonvlmax_masked_store_insn): Ditto.
(modulo_sel_indices): Ditto.
(expand_vec_perm): Fix SLP for gather/scatter.
(prepare_gather_scatter): New function.
(expand_gather_scatter): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of
(subreg:SI (DI CONST_POLY_INT)).
* config/riscv/vector-iterators.md: Add gather/scatter.
* config/riscv/vector.md (vec_duplicate<mode>): Use "@" instead.
(@vec_duplicate<mode>): Ditto.
(@pred_indexed_<order>store<VNX16_QHS:mode><VNX16_QHSDI:mode>):
Fix name.
(@pred_indexed_<order>store<VNX16_QHSD:mode><VNX16_QHSDI:mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-8.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-9.c: New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-12.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-11.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/mask_scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-10.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-8.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/scatter_store_run-9.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c:
New test.

Daily bump.

RISC-V: Support COND_LEN_* patterns

This middle-end has been merged:
https://github.com/gcc-mirror/gcc/commit/0d4dd7e07a879d6c07a33edb2799710faa95651e

With this patch, we can handle operations may trap on elements outside the loop.

These 2 following cases will be addressed by this patch:

1. integer division:

  #define TEST_TYPE(TYPE) \
  __attribute__((noipa)) \
  void vrem_##TYPE (TYPE * __restrict dst, TYPE * __restrict a, TYPE * __restrict b, int n) \
  { \
    for (int i = 0; i < n; i++) \
      dst[i] = a[i] % b[i]; \
  }
  #define TEST_ALL() \
   TEST_TYPE(int8_t) \
  TEST_ALL()

  Before this patch:

   vrem_int8_t:
        ble     a3,zero,.L14
        csrr    t4,vlenb
        addiw   a5,a3,-1
        addiw   a4,t4,-1
        sext.w  t5,a3
        bltu    a5,a4,.L10
        csrr    t3,vlenb
        subw    t3,t5,t3
        li      a5,0
        vsetvli t6,zero,e8,m1,ta,ma
.L4:
        add     a6,a2,a5
        add     a7,a0,a5
        add     t1,a1,a5
        mv      a4,a5
        add     a5,a5,t4
        vl1re8.v        v2,0(a6)
        vl1re8.v        v1,0(t1)
        sext.w  a6,a5
        vrem.vv v1,v1,v2
        vs1r.v  v1,0(a7)
        bleu    a6,t3,.L4
        csrr    a5,vlenb
        addw    a4,a4,a5
        sext.w  a5,a4
        beq     t5,a4,.L16
.L3:
        csrr    a6,vlenb
        subw    t5,t5,a4
        srli    a6,a6,1
        addiw   t1,t5,-1
        addiw   a7,a6,-1
        bltu    t1,a7,.L9
        slli    a4,a4,32
        srli    a4,a4,32
        add     t0,a1,a4
        add     t6,a2,a4
        add     a4,a0,a4
        vsetvli a7,zero,e8,mf2,ta,ma
        sext.w  t3,a6
        vle8.v  v1,0(t0)
        vle8.v  v2,0(t6)
        subw    t4,t5,a6
        vrem.vv v1,v1,v2
        vse8.v  v1,0(a4)
        mv      t1,t3
        bltu    t4,t3,.L7
        csrr    t1,vlenb
        add     a4,a4,a6
        add     t0,t0,a6
        add     t6,t6,a6
        sext.w  t1,t1
        vle8.v  v1,0(t0)
        vle8.v  v2,0(t6)
        vrem.vv v1,v1,v2
        vse8.v  v1,0(a4)
.L7:
        addw    a5,t1,a5
        beq     t5,t1,.L14
.L9:
        add     a4,a1,a5
        add     a6,a2,a5
        lb      a6,0(a6)
        lb      a4,0(a4)
        add     a7,a0,a5
        addi    a5,a5,1
        remw    a4,a4,a6
        sext.w  a6,a5
        sb      a4,0(a7)
        bgt     a3,a6,.L9
.L14:
        ret
.L10:
        li      a4,0
        li      a5,0
        j       .L3
.L16:
        ret

After this patch:

   vrem_int8_t:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,m1,tu,ma
vle8.v v1,0(a1)
vle8.v v2,0(a2)
sub a3,a3,a5
vrem.vv v1,v1,v2
vse8.v v1,0(a0)
add a1,a1,a5
add a2,a2,a5
add a0,a0,a5
bne a3,zero,.L3
.L5:
ret

2. Floating-point operation **WITHOUT** -ffast-math:

    #define TEST_TYPE(TYPE) \
    __attribute__((noipa)) \
    void vadd_##TYPE (TYPE * __restrict dst, TYPE *__restrict a, TYPE *__restrict b, int n) \
    { \
      for (int i = 0; i < n; i++) \
        dst[i] = a[i] + b[i]; \
    }

    #define TEST_ALL() \
     TEST_TYPE(float) \

    TEST_ALL()

Before this patch:

   vadd_float:
        ble     a3,zero,.L10
        csrr    a4,vlenb
        srli    t3,a4,2
        addiw   a5,a3,-1
        addiw   a6,t3,-1
        sext.w  t6,a3
        bltu    a5,a6,.L7
        subw    t5,t6,t3
        mv      t1,a1
        mv      a7,a2
        mv      a6,a0
        li      a5,0
        vsetvli t4,zero,e32,m1,ta,ma
.L4:
        vl1re32.v       v1,0(t1)
        vl1re32.v       v2,0(a7)
        addw    a5,a5,t3
        vfadd.vv        v1,v1,v2
        vs1r.v  v1,0(a6)
        add     t1,t1,a4
        add     a7,a7,a4
        add     a6,a6,a4
        bgeu    t5,a5,.L4
        beq     t6,a5,.L10
        sext.w  a5,a5
.L3:
        slli    a4,a5,2
.L6:
        add     a6,a1,a4
        add     a7,a2,a4
        flw     fa4,0(a6)
        flw     fa5,0(a7)
        add     a6,a0,a4
        addiw   a5,a5,1
        fadd.s  fa5,fa5,fa4
        addi    a4,a4,4
        fsw     fa5,0(a6)
        bgt     a3,a5,.L6
.L10:
        ret
.L7:
        li      a5,0
        j       .L3

After this patch:

   vadd_float:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a2)
sub a3,a3,a5
vfadd.vv v1,v1,v2
vse32.v v1,0(a0)
add a1,a1,a4
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret

gcc/ChangeLog:

* config/riscv/autovec.md (cond_len_<optab><mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_cond_len_binop): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_tu_insn): Ditto.
(emit_nonvlmax_fp_tu_insn): Ditto.
(need_fp_rounding_p): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv.cc (riscv_preferred_else_value): Ditto.
(TARGET_PREFERRED_ELSE_VALUE): New target hook.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vadd-run-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-run-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-run-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: New test.

Break out profile updating code from gimple_duplicate_sese_region

Move profile updating to tree-ssa-loop-ch.cc since it is
now quite ch specific. There are no functional changes.

Boostrapped/regtesed x86_64-linux, comitted.

gcc/ChangeLog:

* tree-cfg.cc (gimple_duplicate_sese_region): Rename to ...
(gimple_duplicate_seme_region): ... this; break out profile updating
code to ...
* tree-ssa-loop-ch.cc (update_profile_after_ch): ... here.
(ch_base::copy_headers): Update.
* tree-cfg.h (gimple_duplicate_sese_region): Rename to ...
(gimple_duplicate_seme_region): ... this.

[range-op] Take known mask into account for bitwise ands [PR107043]

PR tree-optimization/107043

gcc/ChangeLog:

* range-op.cc (operator_bitwise_and::op1_range): Update bitmask.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107043.c: New test.

[range-op] Take known set bits into account in popcount [PR107053]

This patch teaches popcount about known set bits which are now
available in the irange.

PR tree-optimization/107053

gcc/ChangeLog:

* gimple-range-op.cc (cfn_popcount): Use known set bits.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107053.c: New test.

libstdc++: Check conversion from filesystem::path to wide strings [PR95048]

The testcase added for this bug only checks conversion from wide strings
on construction, but the fix also covered conversion to wide stings via
path::wstring(). Add checks for that, and u16string() and u32string().

libstdc++-v3/ChangeLog:

PR libstdc++/95048
* testsuite/27_io/filesystem/path/construct/95048.cc: Check
conversions to wide strings.
* testsuite/experimental/filesystem/path/construct/95048.cc:
Likewise.

libstdc++: Compile basic_file_stdio.cc for LFS

Instead of using fopen64, lseek64, and fstat64 we can just include
<bits/largefile-config.h> which defines _FILE_OFFSET_BITS=64 (and
similar target-specific macros). Then we can just use fopen, lseek and
fstat as normal, and they'll be the LFS versions if supported by the
target.

libstdc++-v3/ChangeLog:

* config/io/basic_file_stdio.cc: Define LFS macros.
(__basic_file<char>::open): Use fopen unconditionally.
(get_file_offset): Use lseek unconditionally.
(__basic_file<char>::seekoff): Likewise.
(__basic_file<char>::showmanyc): Use fstat unconditionally.

libstdc++: Fix --enable-cstdio=stdio_pure [PR110574]

When configured with --enable-cstdio=stdio_pure we need to consistently
use fseek and not mix seeks on the file descriptor with reads and writes
on the FILE stream.

There are also a number of bugs related to error handling and return
values, because fread and fwrite return 0 on error, not -1, and fseek
returns 0 on success, not the file offset.

libstdc++-v3/ChangeLog:

PR libstdc++/110574
* acinclude.m4 (GLIBCXX_CHECK_LFS): Check for fseeko and ftello
and define _GLIBCXX_USE_FSEEKO_FTELLO.
* config.h.in: Regenerate.
* configure: Regenerate.
* config/io/basic_file_stdio.cc (xwrite) [_GLIBCXX_USE_STDIO_PURE]:
Check for fwrite error correctly.
(__basic_file<char>::xsgetn) [_GLIBCXX_USE_STDIO_PURE]: Check for
fread error correctly.
(get_file_offset): New function.
(__basic_file<char>::seekoff) [_GLIBCXX_USE_STDIO_PURE]: Use
fseeko if available. Use get_file_offset instead of return value
of fseek.
(__basic_file<char>::showmanyc): Use get_file_offset.

IRA+LRA: Change return type of predicate functions from int to bool

gcc/ChangeLog:

* ira.cc (equiv_init_varies_p): Change return type from int to bool
and adjust function body accordingly.
(equiv_init_movable_p): Ditto.
(memref_used_between_p): Ditto.
* lra-constraints.cc (valid_address_p): Ditto.

libstdc++: Use __is_enum built-in trait

This patch replaces is_enum<T>::value with __is_enum built-in trait in
the type_traits header.

libstdc++-v3/ChangeLog:

* include/std/type_traits (__make_unsigned_selector): Use
__is_enum built-in trait.
(__make_signed_selector): Likewise.
(__underlying_type_impl): Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

[range-op] Enable value/mask propagation in range-op.

Throw the switch in range-ops to make full use of the value/mask
information instead of only the nonzero bits. This will cause most of
the operators implemented in range-ops to use the value/mask
information calculated by CCP's bit_value_binop() function which
range-ops uses. This opens up more optimization opportunities.

In follow-up patches I will change the global range setter
(set_range_info) to be able to save the value/mask pair, and make both
CCP and IPA be able to save the known ones bit info, instead of
throwing it away.

gcc/ChangeLog:

* range-op.cc (irange_to_masked_value): Remove.
(update_known_bitmask): Update irange value/mask pair instead of
only updating nonzero bits.

gcc/testsuite/ChangeLog:

* gcc.dg/pr83073.c: Adjust testcase.

Improve profile update in loop-ch

Improve profile update in loop-ch to handle situation where duplicated header
has loop invariant test.  In this case we konw that all count of the exit edge belongs to
the duplicated loop header edge and can update probabilities accordingly.
Since we also do all the work to track this information from analysis to duplicaiton
I also added code to turn those conditionals to constants so we do not need later
jump threading pass to clean up.

This made me to work out that the propagation was buggy in few aspects
1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
    to be IV-like when they are not
2) it did not check for novops calls that are not required to return same
    value on every invocation.
3) I also added check for asm statement since those are not necessarily
    reproducible either.

I would like to do more changes, but tried to prevent this patch from
snowballing.  The analysis of what statements will remain after duplication can
be improved.  I think we should use ranger query for other than first basic
block, too and possibly drop the IV heuristics then.  Also it seems that a lot
of this logic is pretty much same to analysis in peeling pass, so unifying this
would be nice.

I also think I should move the profile update out of
gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
since those regions are singe entry multiple exit.

Bootstrapped/regtsted x86_64-linux, OK?

Honza

gcc/ChangeLog:

* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
parameter and rewrite profile updating code to handle edges elimination.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
(loop_iv_derived_p): New function.
(should_duplicate_loop_header_p): Track invariant exit edges; fix handling
of PHIs and propagation of IV derived variables.
(ch_base::copy_headers): Pass around the invariant edges hash set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.

riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

Recently, two identical XTheadCondMov tests have been added, which both fail.
Let's fix that by changing the following:
* Merge both files into one (no need for separate tests for rv32 and rv64)
* Drop unrelated attribute check test (we already test for `th.mveqz`
and `th.mvnez` instructions, so there is little additional value)
* Fix the pattern to allow matching

Fixes: a1806f0918c0 ("RISC-V: Optimize TARGET_XTHEADCONDMOV")
gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

ifcvt: Change return type of predicate functions from int to bool

Also change some internal variables and function arguments from int to bool.

gcc/ChangeLog:

* ifcvt.cc (cond_exec_changed_p): Change variable to bool.
(last_active_insn): Change "skip_use_p" function argument to bool.
(noce_operand_ok): Change return type from int to bool.
(find_cond_trap): Ditto.
(block_jumps_and_fallthru_p): Change "fallthru_p" and
"jump_p" variables to bool.
(noce_find_if_block): Change return type from int to bool.
(cond_exec_find_if_block): Ditto.
(find_if_case_1): Ditto.
(find_if_case_2): Ditto.
(dead_or_predicable): Ditto. Change "reversep" function arg to bool.
(block_jumps_and_fallthru): Rename from block_jumps_and_fallthru_p.
(cond_exec_process_insns): Change return type from int to bool.
Change "mod_ok" function arg to bool.
(cond_exec_process_if_block): Change return type from int to bool.
Change "do_multiple_p" function arg to bool.  Change "then_mod_ok"
variable to bool.
(noce_emit_store_flag): Change return type from int to bool.
Change "reversep" function arg to bool.  Change "cond_complex"
variable to bool.
(noce_try_move): Change return type from int to bool.
(noce_try_ifelse_collapse): Ditto.
(noce_try_store_flag): Ditto. Change "reversep" variable to bool.
(noce_try_addcc): Change return type from int to bool.  Change
"subtract" variable to bool.
(noce_try_store_flag_constants): Change return type from int to bool.
(noce_try_store_flag_mask): Ditto.  Change "reversep" variable to bool.
(noce_try_cmove): Change return type from int to bool.
(noce_try_cmove_arith): Ditto. Change "is_mem" variable to bool.
(noce_try_minmax): Change return type from int to bool.  Change
"unsignedp" variable to bool.
(noce_try_abs): Change return type from int to bool.  Change
"negate" variable to bool.
(noce_try_sign_mask): Change return type from int to bool.
(noce_try_move): Ditto.
(noce_try_store_flag_constants): Ditto.
(noce_try_cmove): Ditto.
(noce_try_cmove_arith): Ditto.
(noce_try_minmax): Ditto.  Change "unsignedp" variable to bool.
(noce_try_bitop): Change return type from int to bool.
(noce_operand_ok): Ditto.
(noce_convert_multiple_sets): Ditto.
(noce_convert_multiple_sets_1): Ditto.
(noce_process_if_block): Ditto.
(check_cond_move_block): Ditto.
(cond_move_process_if_block): Ditto. Change "success_p"
variable to bool.
(rest_of_handle_if_conversion): Change return type to void.

VECT: Apply COND_LEN_* into vectorizable_operation

Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:

Support for the situation that in "vectorizable_operation":
  /* If operating on inactive elements could generate spurious traps,
     we need to restrict the operation to active lanes.  Note that this
     specifically doesn't apply to unhoisted invariants, since they
     operate on the same value for every lane.

     Similarly, if this operation is part of a reduction, a fully-masked
     loop should only change the active lanes of the reduction chain,
     keeping the inactive lanes as-is.  */
  bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
    || reduc_idx >= 0);

For mask_out_inactive is true with length loop control.

So, we can these 2 following cases:

1. Integer division:

   #define TEST_TYPE(TYPE) \
   __attribute__((noipa)) \
   void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
   { \
     for (int i = 0; i < n; i++) \
       dst[i] = a[i] % b[i]; \
   }
   #define TEST_ALL() \
   TEST_TYPE(int8_t) \
   TEST_ALL()

With this patch:

  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

2. Floating-point arithmetic **WITHOUT** -ffast-math

   #define TEST_TYPE(TYPE) \
   __attribute__((noipa)) \
   void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
   { \
     for (int i = 0; i < n; i++) \
       dst[i] = a[i] + b[i]; \
   }
   #define TEST_ALL() \
   TEST_TYPE(float) \
   TEST_ALL()

With this patch:

  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

With this patch, we can make sure operations won't trap for elements that "mask_out_inactive".

gcc/ChangeLog:

* internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support.
(CASE): Ditto.
(get_conditional_len_internal_fn): New function.
* internal-fn.h (get_conditional_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_*
support.

libgomp.texi: add cross ref, remove duplicated entry

libgomp/

* libgomp.texi (OpenMP 5.0): Replace '... stub' by @ref to
'Memory allocation' section which contains the full status.
(TR11): Remove differently worded duplicated entry.

i386: Fix FAIL of gcc.target/i386/pr91681-1.c

I committed the wrong version of this patch (with a typo).
Updating to the correct bootstrapped and regression tested version
as obvious.

2023-07-12 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Typo.

i386: Fix FAIL of gcc.target/i386/pr91681-1.c

The recent change in TImode parameter passing on x86_64 results in the
FAIL of pr91681-1.c. The issue is that with the extra flexibility,
the combine pass is now spoilt for choice between using either the
*add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
patterns, when one operand is a *concat and the other is a zero_extend.
The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
define_insn_and_split, that can benefit both from the register allocation
of *concat, and still avoid the xor normally required by zero extension.

I'm investigating a follow-up refinement to improve register allocation
further by avoiding the early clobber in the =&r, and handling (custom)
reloads explicitly, but this piece resolves the testcase failure.

2023-07-12 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
define_insn_and_split derived from *add<dwi>3_doubleword_concat
and *add<dwi>3_doubleword_zext.

PR target/110598: Fix rega = 0; rega ^= rega regression in i386.md

This patch fixes the regression PR target/110598 caused by my recent
addition of a peephole2.  The intention of that optimization was to
simplify zeroing a register, followed by an IOR, XOR or PLUS operation
on it into a move, or as described in the comment:
;; Peephole2 rega = 0; rega op= regb into rega = regb.

The issue is that I'd failed to consider the (rare and unusual) case,
where regb is rega, where the transformation leads to the incorrect
"rega = rega", when it should be "rega = 0".  The minimal fix is to
add a !reg_mentioned_p check to the recent peephole2.

In addition to resolving the regression, I've added a second peephole2
to optimize the problematic case above, which contains a false
dependency and is therefore tricky to optimize elsewhere.  This is an
improvement over GCC 13, for example, that generates the redundant:

        xorl    %edx, %edx
        xorq    %rdx, %rdx

2023-07-12  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/110598
* config/i386/i386.md (peephole2): Check !reg_mentioned_p when
optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS].
(peephole2): Simplify rega = 0; rega op= rega cases.

gcc/testsuite/ChangeLog
PR target/110598
* gcc.target/i386/pr110598.c: New test case.

i386: Tweak ix86_expand_int_compare to use PTEST for vector equality.

I've come up with an alternate/complementary/supplementary fix to the
patch https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
for generating the PTEST during RTL expansion, rather than rely on
this being caught/optimized later during STV.

You'll notice in this patch, the tests for TARGET_SSE4_1 and TImode
appear last.  When I was writing this, I initially also added support
for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
support 256-bit OImode (which also explains why we don't have an OImode
to V1OImode scalar-to-vector pass).  Retaining this clause ordering
should minimize the lines changed if things change in future.

2023-07-12  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_compare): If
testing a TImode SUBREG of a 128-bit vector register against
zero, use a PTEST instruction instead of first moving it to
a pair of scalar registers.

genopinit: Allow more than 256 modes.

Upcoming changes for RISC-V will have us exceed 255 modes or 8 bits.
This patch increases the limit to 10 bits and adjusts the hashing
function for the gen* and optabs-query lookups accordingly.
Consequently, the number of optabs is limited to 4095.

gcc/ChangeLog:

* genopinit.cc (main): Adjust maximal number of optabs and
machine modes.
* gensupport.cc (find_optab): Shift optab by 20 and mode by
10 bits.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.

libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

As with the memkind library, it is only used when found at runtime;
it does not need to be present when building GCC.

The included testcase does not check whether the memory has been placed
on the nearest node as the Linux kernel memory handling too often ignores
that hint, using a different node for the allocation. However, when
running with 'numactl --preferred=<node> ./executable', it is clearly
visible that the feature works by comparing malloc/default vs. nearest
placement (using get_mempolicy to obtain the node for a mem addr).

libgomp/ChangeLog:

* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
add GOMP_MEMKIND_LIBNUMA.
(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
(omp_init_allocator): Handle partition=nearest with libnuma if avail.
(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
needed.
* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
* libgomp.texi: Fix a typo; use 'fi' instead of its ligature char.
(Memory allocation): Renamed from 'Memory allocation with libmemkind';
updated for libnuma usage.
* testsuite/libgomp.c-c++-common/alloc-11.c: New test.
* testsuite/libgomp.c-c++-common/alloc-12.c: New test.

gfortran: Allow ref'ing PDT's len() in parameter-initializer.

Fix declaring a parameter initialized using a pdt_len reference
not simplifying the reference to a constant.

2023-07-12 Andre Vehreschild <vehre@gcc.gnu.org>

gcc/fortran/ChangeLog:

PR fortran/102003
* expr.cc (find_inquiry_ref): Replace len of pdt_string by
constant.
(simplify_ref_chain): Ensure input to find_inquiry_ref is
NULL.
(gfc_match_init_expr): Prevent PDT analysis for function calls.
(gfc_pdt_find_component_copy_initializer): Get the initializer
value for given component.
* gfortran.h (gfc_pdt_find_component_copy_initializer): New
function.
* simplify.cc (gfc_simplify_len): Replace len() of PDT with pdt
component ref or constant.

gcc/testsuite/ChangeLog:

* gfortran.dg/pdt_33.f03: New test.

tree-optimization/110630 - enhance SLP permute support

The following enhances the existing lowpart extraction support for
SLP VEC_PERM nodes to cover all vector aligned extractions. This
allows the existing bb-slp-pr95839.c testcase to be vectorized
with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase
with SSE2.

PR tree-optimization/110630
* tree-vect-slp.cc (vect_add_slp_permutation): New
offset parameter, honor that for the extract code generation.
(vectorizable_slp_permutation_1): Handle offsetted identities.

* gcc.dg/vect/bb-slp-pr95839.c: Make stricter.
* gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.

RISC-V: Support integer mult highpart auto-vectorization

This patch is adding an obvious missing mult_high auto-vectorization pattern.

Consider this following case:
void __attribute__ ((noipa)) \
mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count) \
{ \
  for (int i = 0; i < count; ++i) \
    dst[i] = src[i] / 17; \
}

  T (int32_t) \

TEST_ALL (DEF_LOOP)

Before this patch:
mod_int32_t:
        ble     a2,zero,.L5
        li      a5,17
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.x v2,a5
.L3:
        vsetvli a5,a2,e8,mf4,ta,ma
        vle32.v v1,0(a1)
        vsetvli a3,zero,e32,m1,ta,ma
        slli    a4,a5,2
        vdiv.vv v1,v1,v2
        sub     a2,a2,a5
        vsetvli zero,a5,e32,m1,ta,ma
        vse32.v v1,0(a0)
        add     a1,a1,a4
        add     a0,a0,a4
        bne     a2,zero,.L3
.L5:
        ret

After this patch:
mod_int32_t:
ble a2,zero,.L5
li a5,2021163008
addiw a5,a5,-1927
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v3,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v2,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
slli a4,a5,2
vmulh.vv v1,v2,v3
sub a2,a2,a5
vsra.vi v2,v2,31
vsra.vi v1,v1,3
vsub.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret

Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
4 more instructions are generated, we belive it's much better than before
since division is very slow in the hardward.

gcc/ChangeLog:

* config/riscv/autovec.md (smul<mode>3_highpart): New pattern.
(umul<mode>3_highpart): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.

x86: improve fast bfloat->float conversion

There's nothing AVX512BW-ish in here, so no reason to use Yw as the
constraints for the AVX alternative. Furthermore by using the 512-bit
form of VPSSLD (in a new alternative) all 32 registers can be used
directly by the insn without AVX512VL needing to be enabled.

Also adjust the originally last alternative's "prefix" attribute to
maybe_evex.

gcc/

* config/i386/i386.md (extendbfsf2_1): Add new AVX512F
alternative. Adjust original last alternative's "prefix"
attribute to maybe_evex.

x86: make better use of VBROADCASTSS / VPBROADCASTD

... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
never longer (yet sometimes shorter) than the corresponding VSHUFPS /
VPSHUFD, due to the immediate operand of the shuffle insns balancing the
(uniform) need for VEX3 in the broadcast ones. When EVEX encoding is
respective the broadcast insns are always shorter.

Add new alternatives to cover the AVX2 and AVX512 cases as appropriate.

While touching this anyway, switch to consistently using "sseshuf1" in
the "type" attributes for all shuffle forms.

gcc/

* config/i386/sse.md (vec_dupv4sf): Make first alternative use
vbroadcastss for AVX2. New AVX512F alternative.
(*vec_dupv4si): New AVX2 and AVX512F alternatives using
vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute.

gcc/testsuite/

* gcc.target/i386/avx2-dupv4sf.c: New test.
* gcc.target/i386/avx2-dupv4si.c: Likewise.
* gcc.target/i386/avx512f-dupv4sf.c: Likewise.
* gcc.target/i386/avx512f-dupv4si.c: Likewise.

riscv: thead: Factor out XThead*-specific peepholes

This patch moves the XThead*-specific peephole passes
into thead-peephole.md with the intend to keep vendor-specific
code separated from RISC-V standard code.

This patch does not contain any functional changes.

gcc/ChangeLog:

* config/riscv/peephole.md: Remove XThead* peephole passes.
* config/riscv/thead.md: Include thead-peephole.md.
* config/riscv/thead-peephole.md: New file.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: Prepare backend for index registers

RISC-V does currently not support index registers.
However, there are some vendor extensions that specify them.
Let's do the necessary changes in the backend so that we can
add support for such a vendor extension in the future.

This is a non-functional change without any intended side-effects.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_regno_ok_for_index_p):
New prototype.
(riscv_index_reg_class): Likewise.
* config/riscv/riscv.cc (riscv_regno_ok_for_index_p): New function.
(riscv_index_reg_class): New function.
* config/riscv/riscv.h (INDEX_REG_CLASS): Call new function
riscv_index_reg_class().
(REGNO_OK_FOR_INDEX_P): Call new function
riscv_regno_ok_for_index_p().

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: Move address classification info types to riscv-protos.h

enum riscv_address_type and struct riscv_address_info are used
to store address classification information. Let's move this types
into our common header file in order to share them with other
compilation units.

This is a non-functional change without any intendet side-effects.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum riscv_address_type):
New location of type definition.
(struct riscv_address_info): Likewise.
* config/riscv/riscv.cc (enum riscv_address_type):
Old location of type definition.
(struct riscv_address_info): Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: Define Xmode macro

Define a Xmode macro that specifies the registers size (XLEN)
similar to Pmode. This allows the backend code to write generic
RV32/RV64 C code (under certain circumstances).

gcc/ChangeLog:

* config/riscv/riscv.h (Xmode): New macro.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: Simplify output of MEM addresses

We have the following situation for MEM RTX objects:
* TARGET_PRINT_OPERAND expands to riscv_print_operand()
* This falls into the default case (unknown or on letter) of the outer
  switch-case-block and the MEM case of the inner switch-case-block and
  calls output_address() in final.cc with XEXP (op, 0) (the address)
* This calls targetm.asm_out.print_operand_address() which is
  riscv_print_operand_address()
* riscv_print_operand_address() is targeting the address of a MEM RTX
* riscv_print_operand_address() calls riscv_print_operand() for the offset
  and directly prints the register if the address is classified as ADDRESS_REG
* This falls into the default case (unknown or on letter) of the outer
  switch-case-block and the default case of the inner switch-case-block and
  calls output_addr_const().

However, since we know that offset must be a CONST_INT (which will be
followed by a '(<reg>)' string), there is no need to call
riscv_print_operand() for the offset.
Instead we can take the shortcut and use output_addr_const().

This change also brings the code in riscv_print_operand_address()
in line with the other cases, where output_addr_const() is used
to print offsets.

Tested with GCC regression test suite and SPEC intrate.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand_address): Use
output_addr_const rather than riscv_print_operand.

riscv: thead: Adjust constraints of th_addsl INSN

A recent change adjusted the constraints of ZBA's shNadd INSN.
Let's mirror this change here as well.

gcc/ChangeLog:

* config/riscv/thead.md: Adjust constraints of th_addsl.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: xtheadmempair: Fix doc for th_mempair_order_operands()

There is an incorrect sentence in the documentation of the function
th_mempair_order_operands(). Let's remove it.

gcc/ChangeLog:

* config/riscv/thead.cc (th_mempair_operands_p):
Fix documentation of th_mempair_order_operands().

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: xtheadmempair: Fix CFA reg notes

The current implementation triggers an assertion in
dwarf2out_frame_debug_cfa_offset() under certain circumstances.
The standard code uses REG_FRAME_RELATED_EXPR notes instead
of REG_CFA_OFFSET notes when saving registers on the stack.
So let's do this as well.

gcc/ChangeLog:

* config/riscv/thead.cc (th_mempair_save_regs):
Emit REG_FRAME_RELATED_EXPR notes in prologue.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extu

The current support of the bitfield-extraction instructions
th.ext and th.extu (XTheadBb extension) only covers sign_extract
and zero_extract. This patch add support for sign_extend and
zero_extend to avoid any shifts for sign or zero extensions.

gcc/ChangeLog:

* config/riscv/riscv.md: No base-ISA extension splitter for XThead*.
* config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext):
New XThead extension INSN.
(*zero_extendsidi2_th_extu): New XThead extension INSN.
(*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadbb-ext-1.c: New test.
* gcc.target/riscv/xtheadbb-extu-1.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'

False dependency happens when destination is only updated by
pternlog. There is no false dependency when destination is also used
in source. So either a pxor should be inserted, or input operand
should be set with constraint '0'.

gcc/ChangeLog:

PR target/110438
PR target/110202
* config/i386/predicates.md
(int_float_vector_all_ones_operand): New predicate.
* config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New
define_insn.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep):
Ditto.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to
define_insn_and_split to avoid false dependence.
(*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto.
(<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint
of operands 1 to '0' to avoid false dependence.
(*andnot<mode>3): Ditto.
(iornot<mode>3): Ditto.
(*<nlogic><mode>3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110438.c: New test.
* gcc.target/i386/pr100711-6.c: Adjust testcase.

Initial Granite Rapids D Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Granite Rapids D.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add graniterapids-d.
* common/config/i386/i386-cpuinfo.h
(enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
* config.gcc: Add -march=graniterapids-d.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle graniterapids-d.
* config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
* doc/extend.texi: Add graniterapids-d.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Add graniterapids-d.
* gcc.target/i386/funcspec-56.inc: Handle new march.

i386: Guard 128 bit VAES builtins with AVX512VL

Since commit 24a8acc, 128 bit intrin is enabled for VAES. However,
AVX512VL is not checked until we reached into pattern, which reports an
ICE.

Added an AVX512VL guard at builtin to report error when checking ISA
flags.

gcc/ChangeLog:

* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
Add OPTION_MASK_ISA_AVX512VL.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-vaes-1.c: New test.

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add Hao Liu to write after approval

MAINTAINERS: Add myself to write after approval

ChangeLog:

* MAINTAINERS: Add Ken Matsui to write after approval

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>

Daily bump.

RISC-V: Optimize permutation codegen with vcompress

This patch is to recognize specific permutation pattern which can be applied compress approach.

Consider this following case:
typedef int8_t vnx64i __attribute__ ((vector_size (64)));
  1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,    \
    37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,    \
    82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,    \
    100, 101, 102, 103, 104, 105, 106, 107
void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t *out)
{
  vnx64i v1 = *(vnx64i*)x;
  vnx64i v2 = *(vnx64i*)y;
  vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
  *(vnx64i*)out = v3;
}

https://godbolt.org/z/P33nev6cW

Before this patch:
        lui     a4,%hi(.LANCHOR0)
        addi    a4,a4,%lo(.LANCHOR0)
        vl4re8.v        v4,0(a4)
        li      a4,64
        vsetvli a5,zero,e8,m4,ta,mu
        vl4re8.v        v20,0(a0)
        vl4re8.v        v16,0(a1)
        vmv.v.x v12,a4
        vrgather.vv     v8,v20,v4
        vmsgeu.vv       v0,v4,v12
        vsub.vv v4,v4,v12
        vrgather.vv     v8,v16,v4,v0.t
        vs4r.v  v8,0(a2)
        ret

After this patch:
        lui a4,%hi(.LANCHOR0)
        addi a4,a4,%lo(.LANCHOR0)
        vsetvli a5,zero,e8,m4,ta,ma
        vl4re8.v v12,0(a1)
        vl4re8.v v8,0(a0)
        vlm.v v0,0(a4)
        vslideup.vi v4,v12,20
        vcompress.vm v4,v8,v0
        vs4r.v v4,0(a2)
        ret

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Add vcompress optimization.
* config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto.
(shuffle_compress_patterns): Ditto.
(expand_vec_perm_const_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.

testsuite: Skip failing analyzer tests on AIX.

Some of the analyzer out-of-bounds-diagram tests fail on AIX.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-4.c: Skip on AIX.
* gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-7.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-13.c: Same.
* gcc.dg/analyzer/out-of-bounds-diagram-15.c: Same.

Fortran: formal symbol attributes for intrinsic procedures [PR110288]

gcc/fortran/ChangeLog:

PR fortran/110288
* symbol.cc (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.

gcc/testsuite/ChangeLog:

PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.

cfg+gcse: Change return type of predicate functions from int to bool

Also change some internal variables from int to bool.

gcc/ChangeLog:

* cfghooks.cc (verify_flow_info): Change "err" variable to bool.
* cfghooks.h (struct cfg_hooks): Change return type of
verify_flow_info from integer to bool.
* cfgrtl.cc (can_delete_note_p): Change return type from int to bool.
(can_delete_label_p): Ditto.
(rtl_verify_flow_info): Change return type from int to bool
and adjust function body accordingly. Change "err" variable to bool.
(rtl_verify_flow_info_1): Ditto.
(free_bb_for_insn): Change return type to void.
(rtl_merge_blocks): Change "b_empty" variable to bool.
(try_redirect_by_replacing_jump): Change "fallthru" variable to bool.
(verify_hot_cold_block_grouping): Change return type from int to bool.
Change "err" variable to bool.
(rtl_verify_edges): Ditto.
(rtl_verify_bb_insns): Ditto.
(rtl_verify_bb_pointers): Ditto.
(rtl_verify_bb_insn_chain): Ditto.
(rtl_verify_fallthru): Ditto.
(rtl_verify_bb_layout): Ditto.
(purge_all_dead_edges): Change "purged" variable to bool.
* cfgrtl.h (free_bb_for_insn): Change return type from int to void.
* postreload-gcse.cc (expr_hasher::equal): Change "equiv_p" to bool.
(load_killed_in_block_p): Change return type from int to bool
and adjust function body accordingly.
(oprs_unchanged_p): Return true/false.
(rest_of_handle_gcse2): Change return type to void.
* tree-cfg.cc (gimple_verify_flow_info): Change return type from
int to bool. Change "err" variable to bool.

rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector built-in tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in.  There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.

testsuite: Require vectors of doubles for pr97428.c

The pr97428.c test assumes support for vectors of doubles, but some
targets only support vectors of floats, causing this test to fail with
such targets. Limit this test to targets that support vectors of
doubles then.

gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.

[modula2] Improve uninitialized variable analysis by combining basic blocks

This patch combines basic blocks for static analysis of uninitialized
variables providing that they are not the top of a loop, are not reached
by a conditional and are not reached after a procedure call.  It also
avoids checking array accesses for static analysis.  Finally the patch
adds switch modifiers to allow static analysis to include conditional
branches for subsequent basic block analysis.

gcc/ChangeLog:

* doc/gm2.texi (-Wuninit-variable-checking=) New item.

gcc/m2/ChangeLog:

* gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New
parameter ScopeSym.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New
parameter ScopeSym.
(InitBasicBlocksFromRange): New parameter ScopeSym.  Call
ConvertQuads2BasicBlock with ScopeSym.
(DisplayBasicBlocks): Uncomment.
* gm2-compiler/M2Code.mod: Replace VariableAnalysis with
ScopeBlockVariableAnalysis.
(InitialDeclareAndOptiomize): Add parameter scope.
(SecondDeclareAndOptimize): Add parameter scope.
* gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope
parameter to DeclareTypesConstantsProceduresInRange.
(DeclareTypesConstantsProceduresInRange): New parameter scope.
Pass scope to DisplayQuadRange.  Reformatted.
* gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter
scope.
* gm2-compiler/M2Optimize.mod (KnownReachable): New parameter
scope.
* gm2-compiler/M2Options.def (SetUninitVariableChecking): Add
arg parameter.
* gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add
arg parameter and set boolean UninitVariableChecking and
UninitVariableConditionalChecking.
(UninitVariableConditionalChecking): New boolean set to FALSE.
* gm2-compiler/M2Quads.def (IsGoto): New procedure function.
(DisplayQuadRange): Add scope parameter.
(LoopAnalysis): Add scope parameter.
* gm2-compiler/M2Quads.mod: Import PutVarArrayRef.
(IsGoto): New procedure function.
(LoopAnalysis): Add scope parameter and use MetaErrorT1 instead
of WarnStringAt.
(BuildStaticArray): Call PutVarArrayRef.
(BuildDynamicArray): Call PutVarArrayRef.
(DisplayQuadRange): Add scope parameter.
(GetM2OperatorDesc): Add relational condition cases.
* gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter.
* gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to
DisplayQuadRange.
(ForeachScopeBlockDo): Pass scopeSym to p.
* gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ...
(ScopeBlockVariableAnalysis): ... this.
* gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add
scope parameter.
(bbEntry): New pointer to record.
(bbArray): New array.
(bbFreeList): New variable.
(errorList): New list.
(IssueConditional): New procedure.
(GenerateNoteFlow): New procedure.
(IssueWarning): New procedure.
(IsUniqueWarning): New procedure.
(CheckDeferredRecordAccess): Re-implement.
(CheckBinary): Add warning and lst parameters.
(CheckUnary): Add warning and lst parameters.
(CheckXIndr): Add warning and lst parameters.
(CheckIndrX): Add warning and lst parameters.
(CheckBecomes): Add warning and lst parameters.
(CheckComparison): Add warning and lst parameters.
(CheckReadBeforeInitQuad): Add warning and lst parameters to all
Check procedures.  Add all case quadruple clauses.
(FilterCheckReadBeforeInitQuad): Add warning and lst parameters.
(CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters.
(bbArrayKill): New procedure.
(DumpBBEntry): New procedure.
(DumpBBArray): New procedure.
(DumpBBSequence): New procedure.
(TestBBSequence): New procedure.
(CreateBBPermultations): New procedure.
(ScopeBlockVariableAnalysis): New procedure.
(GetOp3): New procedure.
(GenerateCFG): New procedure.
(NewEntry): New procedure.
(AppendEntry): New procedure.
(init): Initialize bbFreeList and errorList.
* gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field.
(MakeVar): Set ArrayRef to FALSE.
(PutVarArrayRef): New procedure.
(IsVarArrayRef): New procedure function.
* gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype.
(init_PerCompilationInit): Add call to _M2_M2SymInit_init.
* gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking):
New definition.
* gm2-lang.cc (gm2_langhook_handle_option): Add new case
OPT_Wuninit_variable_checking_.
* lang.opt: Wuninit-variable-checking= new entry.

gcc/testsuite/ChangeLog:

* gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test.
* gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp:
New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space

libgomp/

* allocator.c (omp_init_allocator): Use malloc for
omp_high_bw_mem_space when the memkind lib is unavailable
instead of returning omp_null_allocator.
* libgomp.texi (OpenMP 5.0): Fix typo.
(Memory allocation with libmemkind): Document implementation
in more detail.

c++: coercing variable template from current inst [PR110580]

Here during ahead of time coercion of the variable template-id v1<int>,
since we pass only the innermost arguments to coerce_template_parms (and
outer arguments are still dependent at this point), substitution of the
default template argument V=U just lowers U from level 2 to level 1 rather
than replacing it with int as expected. Thus after coercion we incorrectly
end up with (effectively) v1<int, T> instead of v1<int, int>.

Coercion of a class/alias template-id on the other hand always passes
all levels arguments, which avoids this issue. So this patch makes us
do the same for variable template-ids.

PR c++/110580

gcc/cp/ChangeLog:

* pt.cc (lookup_template_variable): Pass all levels of arguments
to coerce_template_parms, and use the parameters from the most
general template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ83.C: New test.

Fix typo in the testcase.

Antony Polukhin 2023-07-11 09:51:58 UTC
There's a typo at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87

It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()`

gcc/testsuite/ChangeLog:

PR target/110170
* g++.target/i386/pr110170.C: Fix typo.

VECT: Add COND_LEN_* operations for loop control with length targets

Hi, Richard and Richi.

This patch is adding cond_len_* operations pattern for target support loop control with length.

These patterns will be used in these following case:

1. Integer division:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
   {
     for (int i = 0; i < n; ++i)
      {
        a[i] = b[i] / c[i];
      }
   }

  ARM SVE IR:

  ...
  max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });

  Loop:
  ...
  # loop_mask_29 = PHI <next_mask_37(4), max_mask_36(3)>
  ...
  vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
  ...
  vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
  vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, vect__4.8_28);
  ...
  .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...

  For target like RVV who support loop control with length, we want to see IR as follows:

  Loop:
  ...
  # loop_len_29 = SELECT_VL
  ...
  vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
  ...
  vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
  vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, vect__4.8_28, loop_len_29, bias);
  ...
  .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
  ...
  next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
  ...

  Notice here, we use dummp_mask = { -1, -1, .... , -1 }

2. Integer conditional division:
   Similar case with (1) but with condtion:
   void
   f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t * cond, int n)
   {
     for (int i = 0; i < n; ++i)
       {
         if (cond[i])
         a[i] = b[i] / c[i];
       }
   }

   ARM SVE:
   ...
   max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });

   Loop:
   ...
   # loop_mask_55 = PHI <next_mask_77(5), max_mask_76(4)>
   ...
   vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
   ...
   vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
   ...
   vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
   vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, vect__6.13_62);
   ...
   .MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
   ...
   next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });

   Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to gurantee the correct result.

   However, target with length control can not perform this elegant flow, for RVV, we would expect:

   Loop:
   ...
   loop_len_55 = SELECT_VL
   ...
   mask__29.10_58 = vect__4.9_56 != { 0, ... };
   ...
   vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, vect__8.16_66, vect__6.13_62, loop_len_55, bias);
   ...

   Here we expect COND_LEN_DIV predicated by a real mask which is the outcome of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
   and a real length which is produced by loop control : loop_len_55 = SELECT_VL

3. conditional Floating-point operations (no -ffast-math):

    void
    f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
    {
      for (int i = 0; i < n; ++i)
        {
          if (cond[i])
          a[i] = b[i] + a[i];
        }
    }

  ARM SVE IR:
  max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });

  ...
  # loop_mask_49 = PHI <next_mask_71(4), max_mask_70(3)>
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
  ...
  vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56);
  ...
  next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
  ...

  For RVV, we would expect IR:

  ...
  loop_len_49 = SELECT_VL
  ...
  mask__27.10_52 = vect__4.9_50 != { 0, ... };
  ...
  vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, vect__8.16_60, vect__6.13_56, loop_len_49, bias);
  ...

4. Conditional un-ordered reduction:

   int32_t
   f (int32_t *restrict a,
   int32_t *restrict cond, int n)
   {
     int32_t result = 0;
     for (int i = 0; i < n; ++i)
       {
           if (cond[i])
         result += a[i];
       }
     return result;
   }

   ARM SVE IR:

     Loop:
     # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
     ...
     # loop_mask_40 = PHI <next_mask_58(4), max_mask_57(3)>
     ...
     mask__17.11_43 = vect__4.10_41 != { 0, ... };
     vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
     ...
     vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37);
     ...
     next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
     ...

     Epilogue:
     _53 = .REDUC_PLUS (vect__33.16_51); [tail call]

   For RVV, we expect:

    Loop:
     # vect_result_18.7_37 = PHI <vect__33.16_51(4), { 0, ... }(3)>
     ...
     loop_len_40 = SELECT_VL
     ...
     mask__17.11_43 = vect__4.10_41 != { 0, ... };
     ...
     vect__33.16_51 = .COND_LEN_ADD (mask__17.11_43, vect_result_18.7_37, vect__7.14_47, vect_result_18.7_37, loop_len_40, bias);
     ...
     next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
     ...

     Epilogue:
     _53 = .REDUC_PLUS (vect__33.16_51); [tail call]

     I name these patterns as "cond_len_*" since I want the length operand comes after mask operand and all other operands except length operand
     same order as "cond_*" patterns. Such order will make life easier in the following loop vectorizer support.

gcc/ChangeLog:

* doc/md.texi: Add COND_LEN_* operations for loop control with length.
* internal-fn.cc (cond_len_unary_direct): Ditto.
(cond_len_binary_direct): Ditto.
(cond_len_ternary_direct): Ditto.
(expand_cond_len_unary_optab_fn): Ditto.
(expand_cond_len_binary_optab_fn): Ditto.
(expand_cond_len_ternary_optab_fn): Ditto.
(direct_cond_len_unary_optab_supported_p): Ditto.
(direct_cond_len_binary_optab_supported_p): Ditto.
(direct_cond_len_ternary_optab_supported_p): Ditto.
* internal-fn.def (COND_LEN_ADD): Ditto.
(COND_LEN_SUB): Ditto.
(COND_LEN_MUL): Ditto.
(COND_LEN_DIV): Ditto.
(COND_LEN_MOD): Ditto.
(COND_LEN_RDIV): Ditto.
(COND_LEN_MIN): Ditto.
(COND_LEN_MAX): Ditto.
(COND_LEN_FMIN): Ditto.
(COND_LEN_FMAX): Ditto.
(COND_LEN_AND): Ditto.
(COND_LEN_IOR): Ditto.
(COND_LEN_XOR): Ditto.
(COND_LEN_SHL): Ditto.
(COND_LEN_SHR): Ditto.
(COND_LEN_FMA): Ditto.
(COND_LEN_FMS): Ditto.
(COND_LEN_FNMA): Ditto.
(COND_LEN_FNMS): Ditto.
(COND_LEN_NEG): Ditto.
* optabs.def (OPTAB_D): Ditto.

tree-optimization/110614 - SLP splat and re-align (optimized)

The following properly guards the re-align (optimized) paths used
on old power CPUs for the added case of SLP splats from non-grouped
loads. Testcases are existing in dg-torture.

PR tree-optimization/110614
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
SLP splats are not suitable for re-align ops.

ada: Avoid renaming_decl in case of constrained array

This patch avoids rewriting "X: S := F(...);" as "X: S renames F(...);".
That rewrite is incorrect if S is a constrained array subtype,
because it changes the semantics. In the original, the
bounds of X are that of S. But constraints are ignored in
renamings, so the bounds of X would come from F'Result.
This can cause spurious Constraint_Errors in some obscure
cases. It causes unnecessary checks to be inserted, and even
when such checks pass (more common case), they might be less
efficient.

gcc/ada/

* exp_ch3.adb (Expand_N_Object_Declaration): Avoid transforming to
a renaming in case of constrained array that comes from source.

ada: Fix wrong resolution for hidden discriminant in predicate

The problem occurs for hidden discriminants of private discriminated types.

gcc/ada/

* sem_ch13.adb (Replace_Type_References_Generic.Visible_Component):
In the case of private discriminated types, return a discriminant
only if it is listed in the discriminant part of the declaration.

testsuite: Unbreak pr110557.cc where long is 32-bit

On ports with 32-bit long, the test produced excess errors:

gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type

Reported-by: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
gcc/testsuite/ChangeLog:

* g++.dg/vect/pr110557.cc: Use long long instead of long for
64-bit type.
(test): Remove an unnecessary cast.

libgcc: Fix -Wint-conversion warning in find_fde_tail

Fixes commit r14-1614-g49310a99330849 ("libgcc: Fix eh_frame fast path
in find_fde_tail").

libgcc/

PR libgcc/110179
* unwind-dw2-fde-dip.c (find_fde_tail): Add cast to avoid
implicit conversion of pointer value to integer.

Daily bump.

rs6000: Remove redundant MEM_P predicate usage

The quad_memory_operand and vsx_quad_dform_memory_operand predicates contain
a (match_code "mem") test, making their MEM_P usage redundant. Remove them.

2023-07-10 Peter Bergner <bergner@linux.ibm.com>

gcc/
* config/rs6000/predicates.md (quad_memory_operand): Remove redundant
MEM_P usage.
(vsx_quad_dform_memory_operand): Likewise.

d: Merge upstream dmd, druntime a88e1335f7, phobos 1921d29df.

D front-end changes:

- Import dmd v2.104.1.
- Deprecation phase ended for access to private method when
overloaded with public method.

D runtime changes:

- Import druntime v2.104.1.
- Linux input header translations were added to druntime.
- Integration with the Valgrind `memcheck' tool has been added
to the garbage collector.

Phobos changes:

- Import phobos v2.104.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd a88e1335f7.
* dmd/VERSION: Bump version to v2.104.1.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime a88e1335f7.
* src/MERGE: Merge upstream phobos 1921d29df.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (libphobos-checking): Add valgrind flag.
(DRUNTIME_LIBRARIES_VALGRIND): Call.
* libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add
etc/valgrind/valgrind_.c.
(DRUNTIME_DSOURCES): Add etc/valgrind/valgrind.d.
(DRUNTIME_DSOURCES_LINUX): Add core/sys/linux/input.d,
core/sys/linux/input_event_codes.d, core/sys/linux/uinput.d.
* libdruntime/Makefile.in: Regenerate.
* m4/druntime/libraries.m4 (DRUNTIME_LIBRARIES_VALGRIND): Define.

reorg: Change return type of predicate functions from int to bool

Also change some internal variables and function arguments from int to bool.

gcc/ChangeLog:

* reorg.cc (stop_search_p): Change return type from int to bool
and adjust function body accordingly.
(resource_conflicts_p): Ditto.
(insn_references_resource_p): Change return type from int to bool.
(insn_sets_resource_p): Ditto.
(redirect_with_delay_slots_safe_p): Ditto.
(condition_dominates_p): Change return type from int to bool
and adjust function body accordingly.
(redirect_with_delay_list_safe_p): Ditto.
(check_annul_list_true_false): Ditto.  Change "annul_true_p"
function argument to bool.
(steal_delay_list_from_target): Change "pannul_p" function
argument to bool pointer.  Change "must_annul" and "used_annul"
variables from int to bool.
(steal_delay_list_from_fallthrough): Ditto.
(own_thread_p): Change return type from int to bool and adjust
function body accordingly.  Change "allow_fallthrough" function
argument to bool.
(reorg_redirect_jump): Change return type from int to bool.
(fill_simple_delay_slots): Change "non_jumps_p" function
argument from int to bool.  Change "maybe_never" varible to bool.
(fill_slots_from_thread): Change "likely", "thread_if_true" and
"own_thread" function arguments to bool.  Change "lose" and
"must_annul" variables to bool.
(delete_from_delay_slot): Change "had_barrier" variable to bool.
(try_merge_delay_insns): Change "annul_p" variable to bool.
(fill_eager_delay_slots): Change "own_target" and "own_fallthrouhg"
variables to bool.
(rest_of_handle_delay_slots): Change return type from int to void
and adjust function body accordingly.

c++: redeclare_class_template and ttps [PR110523]

Now that we cache level-lowered ttps we can end up processing the same
ttp multiple times via (multiple calls to) redeclare_class_template, so
we can't assume a ttp's DECL_CONTEXT is initially empty.

PR c++/110523

gcc/cp/ChangeLog:

* pt.cc (redeclare_class_template): Relax the ttp DECL_CONTEXT
assert, and downgrade it to a checking assert.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp37.C: New test.

doc: Add doc for RISC-V Operand Modifiers

Document `z` and `i` operand modifiers, we have much more modifiers
other than those two, but they are the only two implement on both
GCC and LLVM, consider the compatibility I would like to document those
two first, and then review other modifiers later to see if any other should
expose and implement on RISC-V LLVM too.

gcc/ChangeLog:

* doc/extend.texi (RISC-V Operand Modifiers): New.

GCSE: Export 'insert_insn_end_basic_block' as global function

Since VSETVL PASS in RISC-V port is using common part of 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)'
and we will also this helper function in riscv.cc for the following patches.

So extract the common part codes of 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)', the new function
of the common part is also call 'insert_insn_end_basic_block (rtx_insn *pat, basic_block bb)' but with different arguments.
And call 'insert_insn_end_basic_block (rtx_insn *pat, basic_block bb)' in 'insert_insn_end_basic_block (struct gcse_expr *expr, basic_block bb)'
and VSETVL PASS in RISC-V port.

Remove redundant codes of VSETVL PASS in RISC-V port.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (add_label_notes): Remove it.
(insert_insn_end_basic_block): Ditto.
(pass_vsetvl::commit_vsetvls): Adapt for new helper function.
* gcse.cc (insert_insn_end_basic_block): Export as global function.
* gcse.h (insert_insn_end_basic_block): Ditto.

arm: Fix MVE intrinsics support with LTO (PR target/110268)

After the recent MVE intrinsics re-implementation, LTO stopped working
because the intrinsics would no longer be defined.

The main part of the patch is simple and similar to what we do for
AArch64:
- call handle_arm_mve_h() from arm_init_mve_builtins to declare the
  intrinsics when the compiler is in LTO mode
- actually implement arm_builtin_decl for MVE.

It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE:
its value in the user code cannot be guessed at LTO time, so we always
have to assume that it was not defined.  The led to a few fixes in the
way we register MVE builtins as placeholders or not.  Without this
patch, we would just omit some versions of the inttrinsics when
__ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++
placeholders, we need to always keep entries for all of them to ensure
that we have a consistent numbering scheme.

2023-06-26  Christophe Lyon   <christophe.lyon@linaro.org>

PR target/110268
gcc/
* config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO.
(arm_builtin_decl): Hahndle MVE builtins.
* config/arm/arm-mve-builtins.cc (builtin_decl): New function.
(add_unique_function): Fix handling of
__ARM_MVE_PRESERVE_USER_NAMESPACE.
(add_overloaded_function): Likewise.
* config/arm/arm-protos.h (builtin_decl): New declaration.

gcc/testsuite/
* gcc.target/arm/pr110268-1.c: New test.
* gcc.target/arm/pr110268-2.c: New test.

testsuite: Add _link flavor for several arm_arch* and arm* effective-targets

For arm targets, we generate many effective-targets with
check_effective_target_FUNC_multilib and
check_effective_target_arm_arch_FUNC_multilib which check if we can
link and execute a simple program with a given set of flags/multilibs.

In some cases however, it's possible to link but not to execute a
program, so this patch adds similar _link effective-targets which only
check if link succeeds.

The patch does not uupdate the documentation as it already lacks the
numerous existing related effective-targets.

2023-07-07 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* lib/target-supports.exp (arm_*FUNC_link): New effective-targets.

doc: Document arm_v8_1m_main_cde_mve_fp

The arm_v8_1m_main_cde_mve_fp family of effective targets was not
documented when it was introduced.

2023-07-07 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* doc/sourcebuild.texi (arm_v8_1m_main_cde_mve_fp): Document.

ada: Follow-up fix for compilation issue with recent MinGW-w64 versions

It turns out that adaint.c includes other Windows header files than just
windows.h, so defining WIN32_LEAN_AND_MEAN is not sufficient for it.

gcc/ada/

* adaint.c [_WIN32]: Undefine 'abort' macro.

ada: Add typedefs to snames.h-tmpl

A future patch will change sname.h-tmpl to use enums rather than
preprocessor defines. In order to do this, first introduce some
typedefs that can be used in gcc-interface.

gcc/ada/

* snames.h-tmpl (Name_Id, Attribute_Id, Convention_Id)
(Pragma_Id): New typedefs.
(Get_Attribute_Id, Get_Pragma_Id): Use typedef.

ada: Simplify assertion to remove CodePeer message

CodePeer is correctly warning on a test always true in an assertion.
It can be rewritten without loss of proof to avoid that message.

gcc/ada/

* libgnat/s-aridou.adb (Lemma_Powers_Of_2_Commutation): Rewrite
assertion.

ada: Documentation for mixed declarations and statements

This patch documents the new feature that allows declarations mixed with
statements, primarily by referring to the RFC.

gcc/ada/

* doc/gnat_rm/gnat_language_extensions.rst
(Local Declarations Without Block): Document the feature very
briefly, and refer the reader to the RFC for details and examples.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

ada: hardcfr: optionally disable in leaf functions

Document -fhardcfr-skip-leaf.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst (Control Flow
Hardening): Document -fhardcfr-skip-leaf.
* gnat_rm.texi: Regenerate.

ada: hardcfr: mark throw-expected functions

Adjust documentation to reflect the introduction of
-fhardcfr-check-noreturn-calls=no-xthrow.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst (Control Flow
Redundancy): Add -fhardcfr-check-noreturn-calls=no-xthrow.
* gnat_rm.texi: Regenerate.

ada: Adapt proof of System.Arith_Double to remove CVC4

The proof of System.Arith_Double still required the use of
CVC4, now replaced by its successor cvc5. Adapt the proof to be
able to remove CVC4 in the proof of run-time units.

gcc/ada/

* libgnat/s-aridou.adb (Lemma_Div_Mult): New simple lemma.
(Lemma_Powers_Of_2_Commutation): State post in else branch.
(Lemma_Div_Pow2): Introduce local lemma and use it.
(Scaled_Divide): Use cut operations in assertions, lemmas, new
assertions. Introduce local lemma and use it.

ada: Add leafy mode for zero-call-used-regs

Document leafy mode.

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst (Register
Scrubbing): Document leafy mode.
* gnat_rm.texi: Regenerate.

vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

If a bit-field is signed and it's wider than the output type, we must
ensure the extracted result sign-extended.  But this was not handled
correctly.

For example:

    int x : 8;
    long y : 55;
    bool z : 1;

The vectorized extraction of y was:

    vect__ifc__49.29_110 =
      MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.27_108];
    vect_patt_38.30_112 =
      vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 };
    vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
    vect_patt_40.32_114 =
      VIEW_CONVERT_EXPR<vector(2) long int>(vect_patt_39.31_113);

This is obviously incorrect.  This pach has implemented it as:

    vect__ifc__25.16_62 =
      MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.14_60];
    vect_patt_31.17_63 =
      VIEW_CONVERT_EXPR<vector(2) long int>(vect__ifc__25.16_62);
    vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
    vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;

gcc/ChangeLog:

PR tree-optimization/110557
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern):
Ensure the output sign-extended if necessary.

gcc/testsuite/ChangeLog:

PR tree-optimization/110557
* g++.dg/vect/pr110557.cc: New test.

i386: Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.

This patch implements another of Uros' suggestions, to investigate a
insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64.
In PR 88873, the RTL the middle-end expands for passing V2DF in TImode
is subtly different from what it does for V2DI in TImode, sufficiently so
that my explanations for why insvti_lowpart_1 isn't required don't apply
in this case.

This patch adds an insvti_lowpart_1 pattern, complementing the existing
insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1.
Because the middle-end represents 128-bit constants using CONST_WIDE_INT
and 64-bit constants using CONST_INT, it's easiest to treat these as
different patterns, rather than attempt <dwi> parameterization.

This patch also includes a peephole2 (actually a pair) to transform
xchg instructions into mov instructions, when one of the destinations
is unused.  This optimization is required to produce the optimal code
sequences below.

For the 64-bit case:

__int128 foo(__int128 x, unsigned long long y)
{
  __int128 m = ~((__int128)~0ull);
  __int128 t = x & m;
  __int128 r = t | y;
  return r;
}

Before:
        xchgq   %rdi, %rsi
        movq    %rdx, %rax
        xorl    %esi, %esi
        xorl    %edx, %edx
        orq     %rsi, %rax
        orq     %rdi, %rdx
        ret

After:
        movq    %rdx, %rax
        movq    %rsi, %rdx
        ret

For the 32-bit case:

long long bar(long long x, int y)
{
  long long mask = ~0ull << 32;
  long long t = x & mask;
  long long r = t | (unsigned int)y;
  return r;
}

Before:
        pushl   %ebx
        movl    12(%esp), %edx
        xorl    %ebx, %ebx
        xorl    %eax, %eax
        movl    16(%esp), %ecx
        orl     %ebx, %edx
        popl    %ebx
        orl     %ecx, %eax
        ret

After:
        movl    12(%esp), %eax
        movl    8(%esp), %edx
        ret

2023-07-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform xchg insn with a
REG_UNUSED note to a (simple) move.
(*insvti_lowpart_1): New define_insn_and_split.
(*insvdi_lowpart_1): Likewise.

gcc/testsuite/ChangeLog
* gcc.target/i386/insvdi_lowpart-1.c: New test case.
* gcc.target/i386/insvti_lowpart-1.c: Likewise.

i386: Add AVX512 support for STV of SI/DImode rotation by constant.

Following Uros' suggestion, this patch adds support for AVX512VL's
vpro[lr][dq] instructions to the recently added scalar-to-vector (STV)
enhancements to handle DImode and SImode rotations by a constant.

For the test cases:

unsigned long long rot1(unsigned long long x) {
  return (x>>1) | (x<<63);
}

void mem1(unsigned long long *p) {
  *p = rot1(*p);
}

with -m32 -O2 -mavx512vl, we currently generate:

rot1:   movl    4(%esp), %eax
        movl    8(%esp), %edx
        movl    %eax, %ecx
        shrdl   $1, %edx, %eax
        shrdl   $1, %ecx, %edx
        ret

mem1:   movl    4(%esp), %eax
        vmovq   (%eax), %xmm0
        vpshufd $20, %xmm0, %xmm0
        vpsrlq  $1, %xmm0, %xmm0
        vpshufd $136, %xmm0, %xmm0
        vmovq   %xmm0, (%eax)
        ret

with this patch, we now generate:

rot1:   vmovq   4(%esp), %xmm0
        vprorq  $1, %xmm0, %xmm0
        vmovd   %xmm0, %eax
        vpextrd $1, %xmm0, %edx
        ret

mem1: movl    4(%esp), %eax
        vmovq   (%eax), %xmm0
        vprorq  $1, %xmm0, %xmm0
        vmovq   %xmm0, (%eax)
        ret

2023-07-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL.
(general_scalar_chain::convert_rotate): On TARGET_AVX512F generate
avx512vl_rolv2di or avx412vl_rolv4si when appropriate.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.

d: Merge upstream dmd, druntime 17ccd12af3, phobos 8d3800bee.

D front-end changes:

- Import dmd v2.104.0.
- Assignment-style syntax is now allowed for `alias this'.
- Overloading `extern(C)' functions is now an error.

D runtime changes:

- Import druntime v2.104.0.

Phobos changes:

- Import phobos v2.104.0.
- Better static assert messages when instantiating
`std.algorithm.iteration.permutations' with wrong inputs.
- Added `std.system.instructionSetArchitecture' and
`std.system.ISA'.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 17ccd12af3.
* dmd/VERSION: Bump version to v2.104.0.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/apply.o to
d/postordervisitor.o.
* d-codegen.cc (make_location_t): Update for new front-end interface.
(build_filename_from_loc): Likewise.
(build_assert_call): Likewise.
(build_array_bounds_call): Likewise.
(build_bounds_index_condition): Likewise.
(build_bounds_slice_condition): Likewise.
(build_frame_type): Likewise.
(get_frameinfo): Likewise.
* d-diagnostic.cc (d_diagnostic_report_diagnostic): Likewise.
* decl.cc (build_decl_tree): Likewise.
(start_function): Likewise.
* expr.cc (ExprVisitor::visit (NewExp *)): Replace code generation of
`new pointer' with front-end lowering.
* runtime.def (NEWITEMT): Remove.
(NEWITEMIT): Remove.
* toir.cc (IRVisitor::visit (LabelStatement *)): Update for new
front-end interface.
* typeinfo.cc (check_typeinfo_type): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 17ccd12af3.
* src/MERGE: Merge upstream phobos 8d3800bee.

gcc/testsuite/ChangeLog:

* gdc.dg/asm4.d: Update test.

Add pre_reload splitter to detect fp min/max pattern.

We have ix86_expand_sse_fp_minmax to detect min/max sematics, but
it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for
the testcase in the PR, there's an extra move from cmp_op0 to if_true,
and it failed ix86_expand_sse_fp_minmax.

This patch adds pre_reload splitter to detect the min/max pattern.

Operands order in MINSS matters for signed zero and NANs, since the
instruction always returns second operand when any operand is NAN or
both operands are zero.

gcc/ChangeLog:

PR target/110170
* config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload
splitter to detect fp max pattern.
(*ieee_min<mode>3_1): Ditto, but for fp min pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr110170.C: New test.
* gcc.target/i386/pr110170.c: New test.

Daily bump.

d: Merge upstream dmd, druntime 28a3b24c2e, phobos 8ab95ded5.

D front-end changes:

- Import dmd v2.104.0-beta.1.
- Better error message when attribute inference fails down the
  call stack.
- Using `;' as an empty statement has been turned into an error.
- Using `in' parameters with non- `extern(D)' or `extern(C++)'
  functions is deprecated.
- `in ref' on parameters has been deprecated in favor of
  `-preview=in'.
- Throwing `immutable', `const', `inout', and `shared' qualified
  objects is now deprecated.
- User Defined Attributes now parse Template Arguments.

D runtime changes:

- Import druntime v2.104.0-beta.1.

Phobos changes:

- Import phobos v2.104.0-beta.1.
- Better static assert messages when instantiating
  `std.algorithm.comparison.clamp' with wrong inputs.
- `std.typecons.Rebindable' now supports all types.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 28a3b24c2e.
* dmd/VERSION: Bump version to v2.104.0-beta.1.
* d-codegen.cc (build_bounds_slice_condition): Update for new
front-end interface.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Initialize global.compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation
with new front-end lowering.
(ExprVisitor::visit (LoweredAssignExp *)): New method.
(ExprVisitor::visit (StructLiteralExp *)): Don't generate static
initializer symbols for structs defined in C sources.
* runtime.def (ARRAYCATT): Remove.
(ARRAYCATNTX): Remove.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 28a3b24c2e.
* src/MERGE: Merge upstream phobos 8ab95ded5.

gcc/testsuite/ChangeLog:

* gdc.dg/rtti1.d: Move array concat testcase to ...
* gdc.dg/nogc1.d: ... here.  New test.

Improve dumping of profile_count

Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point
values.  In many cases one looks at a single function and it is better to think of
basic block frequency, that is how many times it is executed each invocatoin. This
patch makes CFG dumps to also print this info.

For example:
main()
{
for (int i = 0; i < 10; i++)
t();
}

the -fdump-tree-optimized-blocks-details now prints:
int main ()
{
  unsigned int ivtmp_1;
  unsigned int ivtmp_2;

;;   basic block 2, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot
;;    prev block 0, next block 3, flags: (NEW, VISITED)
;;    pred:       ENTRY [always]  count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
;;    succ:       3 [always]  count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)

;;   basic block 3, loop depth 1, count 976138697 (estimated locally, freq 10.0011), maybe hot
;;    prev block 2, next block 4, flags: (NEW, VISITED)
;;    pred:       3 [90.0% (guessed)]  count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE)
;;                2 [always]  count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE)
  # ivtmp_2 = PHI <ivtmp_1(3), 10(2)>
  t ();
  ivtmp_1 = ivtmp_2 + 4294967295;
  if (ivtmp_1 != 0)
    goto <bb 3>; [90.00%]
  else
    goto <bb 4>; [10.00%]
;;    succ:       3 [90.0% (guessed)]  count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE)
;;                4 [10.0% (guessed)]  count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot
;;    prev block 3, next block 1, flags: (NEW, VISITED)
;;    pred:       3 [10.0% (guessed)]  count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE)
  return 0;
;;    succ:       EXIT [always]  count:97603128 (estimated locally, freq 1.0000) (EXECUTABLE)

}

Which makes it easier to see that the inner bb is executed 10 times per invocation

gcc/ChangeLog:

* cfg.cc (check_bb_profile): Dump counts with relative frequency.
(dump_edge_info): Likewise.
(dump_bb_info): Likewise.
* profile-count.cc (profile_count::dump): Add comma between quality and
freq.

gcc/testsuite/ChangeLog:

* gcc.dg/predict-22.c: Update template.

Daily bump.

Add missing profile_dump check

gcc/ChangeLog:

PR tree-optimization/110600
* cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check.

gcc/testsuite/ChangeLog:

PR tree-optimization/110600
* gcc.c-torture/compile/pr110600.c: New test.

Fortran: Fix default type bugs in gfortran [PR99139, PR99368]

2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu>

gcc/fortran
PR fortran/99139
PR fortran/99368
* match.cc (gfc_match_namelist): Check for host associated or
defined types before applying default type.
(gfc_match_select_rank): Apply default type to selector of
unknown type if possible.
* resolve.cc (resolve_fl_variable): Do not apply local default
initialization to assumed rank entities.

gcc/testsuite/
PR fortran/99139
* gfortran.dg/pr99139.f90 : New test

PR fortran/99368
* gfortran.dg/pr99368.f90 : New test