git.ipfire.org Git - thirdparty/gcc.git/log

RISC-V: Add tuple type vget/vset intrinsics

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (valid_type): Adapt for
tuple type support.
(inttype): Ditto.
(floattype): Ditto.
(main): Ditto.
* config/riscv/riscv-vector-builtins-bases.cc: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vset): Add
tuple type vset.
(vget): Add tuple type vget.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_TUPLE_OPS): New macro.
(vint8mf8x2_t): Ditto.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_OPS):
Ditto.
(DEF_RVV_TYPE_INDEX): Ditto.
(rvv_arg_type_info::get_tuple_subpart_type): New function.
(DEF_RVV_TUPLE_TYPE): New macro.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX):
Adapt for tuple vget/vset support.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
(tuple_subpart): Add tuple subpart base type.
* config/riscv/riscv-vector-builtins.h (struct
rvv_arg_type_info): Ditto.
(tuple_type_field): New function.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

RISC-V: Add tuple types support

gcc/ChangeLog:

* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
* config/riscv/riscv-protos.h (riscv_v_ext_tuple_mode_p): New
function.
(get_nf): Ditto.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-v.cc (ENTRY): New macro.
(TUPLE_ENTRY): Ditto.
(get_nf): New function.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_TYPE):
New macro.
(register_tuple_type): New function
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TUPLE_TYPE):
New macro.
(vint8mf8x2_t): New macro.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h (DEF_RVV_TUPLE_TYPE):
Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): New
function.
(TUPLE_ENTRY): Ditto.
(riscv_v_ext_mode_p): New function.
(riscv_v_adjust_nunits): Add tuple mode adjustment.
(riscv_classify_address): Ditto.
(riscv_binary_cost): Ditto.
(riscv_rtx_costs): Ditto.
(riscv_secondary_memory_needed): Ditto.
(riscv_hard_regno_nregs): Ditto.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_vector_mode_supported_p): Ditto.
(riscv_regmode_natural_size): Ditto.
(riscv_array_mode): New function.
(TARGET_ARRAY_MODE): New target hook.
* config/riscv/riscv.md: Add tuple modes.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md (mov<mode>): Add tuple modes data
movement.
(*mov<VT:mode>_<P:mode>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-10.c: New test.
* gcc.target/riscv/rvv/base/abi-11.c: New test.
* gcc.target/riscv/rvv/base/abi-12.c: New test.
* gcc.target/riscv/rvv/base/abi-13.c: New test.
* gcc.target/riscv/rvv/base/abi-14.c: New test.
* gcc.target/riscv/rvv/base/abi-15.c: New test.
* gcc.target/riscv/rvv/base/abi-16.c: New test.
* gcc.target/riscv/rvv/base/abi-8.c: New test.
* gcc.target/riscv/rvv/base/abi-9.c: New test.
* gcc.target/riscv/rvv/base/tuple-1.c: New test.
* gcc.target/riscv/rvv/base/tuple-10.c: New test.
* gcc.target/riscv/rvv/base/tuple-11.c: New test.
* gcc.target/riscv/rvv/base/tuple-12.c: New test.
* gcc.target/riscv/rvv/base/tuple-13.c: New test.
* gcc.target/riscv/rvv/base/tuple-14.c: New test.
* gcc.target/riscv/rvv/base/tuple-15.c: New test.
* gcc.target/riscv/rvv/base/tuple-16.c: New test.
* gcc.target/riscv/rvv/base/tuple-17.c: New test.
* gcc.target/riscv/rvv/base/tuple-18.c: New test.
* gcc.target/riscv/rvv/base/tuple-19.c: New test.
* gcc.target/riscv/rvv/base/tuple-2.c: New test.
* gcc.target/riscv/rvv/base/tuple-20.c: New test.
* gcc.target/riscv/rvv/base/tuple-21.c: New test.
* gcc.target/riscv/rvv/base/tuple-22.c: New test.
* gcc.target/riscv/rvv/base/tuple-23.c: New test.
* gcc.target/riscv/rvv/base/tuple-24.c: New test.
* gcc.target/riscv/rvv/base/tuple-25.c: New test.
* gcc.target/riscv/rvv/base/tuple-26.c: New test.
* gcc.target/riscv/rvv/base/tuple-27.c: New test.
* gcc.target/riscv/rvv/base/tuple-3.c: New test.
* gcc.target/riscv/rvv/base/tuple-4.c: New test.
* gcc.target/riscv/rvv/base/tuple-5.c: New test.
* gcc.target/riscv/rvv/base/tuple-6.c: New test.
* gcc.target/riscv/rvv/base/tuple-7.c: New test.
* gcc.target/riscv/rvv/base/tuple-8.c: New test.
* gcc.target/riscv/rvv/base/tuple-9.c: New test.
* gcc.target/riscv/rvv/base/user-10.c: New test.
* gcc.target/riscv/rvv/base/user-11.c: New test.
* gcc.target/riscv/rvv/base/user-12.c: New test.
* gcc.target/riscv/rvv/base/user-13.c: New test.
* gcc.target/riscv/rvv/base/user-14.c: New test.
* gcc.target/riscv/rvv/base/user-15.c: New test.
* gcc.target/riscv/rvv/base/user-7.c: New test.
* gcc.target/riscv/rvv/base/user-8.c: New test.
* gcc.target/riscv/rvv/base/user-9.c: New test.

Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Speedup cse_insn

When cse_insn prunes src{,_folded,_eqv_here,_related} with the
equivalence set in the *_same_value chain it also searches for
an equivalence to the destination of the instruction with

          /* This is the same as the destination of the insns, we want
             to prefer it.  Copy it to src_related.  The code below will
             then give it a negative cost.  */
          if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
            src_related = p->exp;

this picks up the last such equivalence and in particular any
later duplicate will be pruned by the preceeding

          else if (src_related && GET_CODE (src_related) == code
                   && rtx_equal_p (src_related, p->exp))
            src_related = 0;

first.  This wastes cycles doing extra rtx_equal_p checks.  The
following instead searches for the first destination equivalence
separately in this loop and delays using src_related for it until
we are about to process that, avoiding another redundant rtx_equal_p
check.

I've came here because of a testcase with very large equivalence
lists and compile-time of cse_insn.  The patch below doesn't speed
it up significantly since there's no equivalence on the destination.

In theory this opens the possibility to track dest_related
separately, avoiding the implicit pruning of any previous
value in src_related.  As is the change should be a no-op for
code generation.

* cse.cc (cse_insn): Track an equivalence to the destination
separately and delay using src_related for it.

Improve RTL CSE hash table hash usage

The RTL CSE hash table has a fixed number of buckets (32) each
with a linked list of entries with the same hash value.  The
actual hash values are computed using hash_rtx which uses adds
for mixing and adds the rtx CODE as CODE << 7 (apart from some
exceptions such as MEM).  The unsigned int typed hash value
is then simply truncated for the actual lookup into the fixed
size table which means that usually CODE is simply lost.

The following improves this truncation by first mixing in more
bits using xor.  It does not change the actual hash function
since that's used outside of CSE as well.

An alternative would be to bump the fixed number of buckets,
say to 256 which would retain the LSB of CODE or to 8192 which
can capture all 6 bits required for the last CODE.

As the comment in CSE says, there's invalidate_memory and
flush_hash_table done possibly frequently and those at least
need to walk all slots, so when the hash table is mostly empty
enlarging it will be a loss.  Still there should be more
regular lookups by hash, so less collisions should pay off
as well.

Without enlarging the table a better hash function is unlikely
going to make a big difference, simple statistics on the
number of collisions at insertion time shows a reduction of
around 10%.  Bumping HASH_SHIFT by 1 improves that to 30%
at the expense of reducing the average table fill by 10%
(all of this stats from looking just at fold-const.i at -O2).
Increasing HASH_SHIFT more leaves the table even more sparse
likely showing that hash_rtx uses add for mixing which is
quite bad.  Bumping HASH_SHIFT by 2 removes 90% of all
collisions.

Experimenting with using inchash instead of adds for the
mixing does not improve things when looking at the HASH_SHIFT
bumped by 2 numbers.

* cse.cc (HASH): Turn into inline function and mix
in another HASH_SHIFT bits.
(SAFE_HASH): Likewise.

aarch64: PR target/99195 annotate HADDSUB patterns for vec-concat with zero

Further straightforward patch for the various halving intrinsics with or without rounding, plus tests.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<sur>h<addsub><mode>): Rename to...
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for halving and rounding
add/sub intrinsics.

aarch64: PR target/99195 annotate simple floating-point patterns for vec-concat with zero

Continuing the, almost mechanical, series this patch adds annotation for some of the simple
floating-point patterns we have, and adds testing to ensure that redundant zeroing instructions
are eliminated.

Bootstrapped and tested on aarch64-none-linux-gnu and also aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (add<mode>3): Rename to...
(add<mode>3<vczle><vczbe>): ... This.
(sub<mode>3): Rename to...
(sub<mode>3<vczle><vczbe>): ... This.
(mul<mode>3): Rename to...
(mul<mode>3<vczle><vczbe>): ... This.
(*div<mode>3): Rename to...
(*div<mode>3<vczle><vczbe>): ... This.
(neg<mode>2): Rename to...
(neg<mode>2<vczle><vczbe>): ... This.
(abs<mode>2): Rename to...
(abs<mode>2<vczle><vczbe>): ... This.
(<frint_pattern><mode>2): Rename to...
(<frint_pattern><mode>2<vczle><vczbe>): ... This.
(<fmaxmin><mode>3): Rename to...
(<fmaxmin><mode>3<vczle><vczbe>): ... This.
(*sqrt<mode>2): Rename to...
(*sqrt<mode>2<vczle><vczbe>): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for some unary
and binary floating-point ops.
* gcc.target/aarch64/simd/pr99195_2.c: New test.

Docs: Add vector register constarint for asm operands

`vr`, `vm` and `vd` constarint for vector register constarint, those 3
constarint has implemented on LLVM as well.

gcc/ChangeLog:

* doc/md.texi (RISC-V): Add vr, vm, vd constarint.

clang warning: warning: private field 'm_gc' is not used [-Wunused-private-field]

PR tree-optimization/109693

gcc/ChangeLog:

* value-range-storage.cc (vrange_allocator::vrange_allocator):
Remove unused field.
* value-range-storage.h: Likewise.

c++: Fix up VEC_INIT_EXPR gimplification after r12-7069

During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from &pset to cp_fold_data
&data, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set<tree> and so if during the folding we ever touch
data->flags, we use uninitialized data there.

The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.

2023-05-03 Jakub Jelinek <jakub@redhat.com>

* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false
constructed data rather than &pset to cp_walk_tree with cp_fold_r.

c++: fix TTP level reduction cache

We try to cache the result of reduce_template_parm_level so that when we
reduce the same parm multiple times we get the same result, but this wasn't
working for template template parms because in that case TYPE is a
TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same
level mismatch that we're trying to adjust for. So in that case compare the
template parms of the template template parms instead.

The result can be seen in nontype12.C, where we previously gave three
duplicate errors on line 7 and now give only one because subsequent
substitutions use the cache.

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Fix comparison of
template template parm to cached version.

gcc/testsuite/ChangeLog:

* g++.dg/template/nontype12.C: Check for duplicate error.

Daily bump.

c++: simplify member template substitution

I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X) \
    template <class U> struct X##1; \
    template <class U> struct X##2; \
    template <class U> struct X##3; \
    template <class U> struct X##4; \
    template <class U> struct X##5; \
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

PHIOPT: small refactoring of match_simplify_replacement.

When I added diamond shaped form bb to match_simplify_replacement,
I copied the code to move the statement rather than factoring it
out to a new function. This does the refactoring to a new function
to avoid the duplicated code. It will make adding support for having
two statements to move easier (the second statement will only be a
conversion).

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (move_stmt): New function.
(match_simplify_replacement): Use move_stmt instead
of the inlined version.

MATCH: Port CLRSB part of builtin_zero_pattern

This ports the clrsb builtin part of builtin_zero_pattern
to match.pd. A simple pattern to port.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
pattern.

tree-optimization: [PR109702] MATCH: Fix a ? func(a) : N patterns

I accidently messed up these patterns so the comparison
against 0 and the arguments was not matching up when they
need to be.

I committed this as obvious after a bootstrap/test on x86_64-linux-gnu

PR tree-optimization/109702

gcc/ChangeLog:

* match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-25b.c: New test.

target: [PR109657] (a ? -1 : 0) | b could be optimized better for aarch64

There is no canonical form for this case defined. So the aarch64 backend needs
a pattern to match both of these forms.

The forms are:
(set (reg/i:SI 0 x0)
    (if_then_else:SI (eq (reg:CC 66 cc)
            (const_int 0 [0]))
        (reg:SI 97)
        (const_int -1 [0xffffffffffffffff])))
and
(set (reg/i:SI 0 x0)
    (ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
                (const_int 0 [0])))
        (reg:SI 102)))

Currently the aarch64 backend matches the first form so this
patch adds a insn_and_split to match the second form and
convert it to the first form.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions

PR target/109657

gcc/ChangeLog:

* config/aarch64/aarch64.md (*cmov<mode>_insn_m1): New
insn_and_split pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/csinv-2.c: New test.

c++: less invalidate_class_lookup_cache

In the testcase below, we push_to_top_level to instantiate f and g, and they
can both use the previous_class_level cache from instantiating A<int>.
Wiping the cache in pop_from_top_level is not helpful; we'll do that in
pushclass if needed.

  template <class T> struct A
  {
    int i;
    void f() { i = 42; }
    void g() { i = 24; }
  };

  int main()
  {
    A<int> a;
    a.f();
    a.g();
  }

gcc/cp/ChangeLog:

* name-lookup.cc (pop_from_top_level): Don't
invalidate_class_lookup_cache.

c++: look for empty base at specific offset [PR109678]

While looking at the empty base handling for 109678, it occurred to me that
we ought to be able to look for an empty base at a specific offset, not just
in general.

PR c++/109678

gcc/cp/ChangeLog:

* cp-tree.h (lookup_base): Add offset parm.
* constexpr.cc (cxx_fold_indirect_ref_1): Pass it.
* search.cc (struct lookup_base_data_s): Add offset.
(dfs_lookup_base): Handle it.
(lookup_base): Pass it.

c++: std::variant slow to compile [PR109678]

Here, when dealing with a class with a complex subobject structure, we would
try and fail to find the relevant FIELD_DECL for an empty base before giving
up. And we would do this at each level, in a combinatorially problematic
way. Instead, we should check for an empty base first.

PR c++/109678

gcc/cp/ChangeLog:

* constexpr.cc (cxx_fold_indirect_ref_1): Handle empty base first.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/variant1.C: New test.

RISC-V: Table A.6 conformance tests

These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Weaken atomic loads

This change brings atomic loads in line with table A.6 of the ISA
manual.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/sync.md (atomic_load<mode>): Implement atomic
load mapping.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Weaken mem_thread_fence

This change brings atomic fences in line with table A.6 of the ISA
manual.

Relax mem_thread_fence according to the memmodel given.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/sync.md (mem_thread_fence_1): Change fence
depending on the given memory model.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Weaken LR/SC pairs

Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.

Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.

This change brings LR/SC ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_union_memmodels): Expose
riscv_union_memmodels function to sync.md.
* config/riscv/riscv.cc (riscv_union_memmodels): Add function to
get the union of two memmodels in sync.md.
(riscv_print_operand): Add %I and %J flags that output the
optimal LR/SC flag bits for a given memory model.
* config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl
bits on SC op and replace with optimized %I, %J flags.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Eliminate AMO op fences

Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.

This change brings AMO ops in line with table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/riscv.cc
(riscv_memmodel_needs_amo_release): Change function name.
(riscv_print_operand): Remove unneeded %F case.
* config/riscv/sync.md: Remove unneeded fences.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Strengthen atomic stores

This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

PR target/89835

gcc/ChangeLog:

* config/riscv/sync.md (atomic_store<mode>): Use simple store
instruction in combination with fence(s).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr89835.c: New test.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Add AMO release bits

This patch sets the relevant .rl bits on amo operations.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Change behavior
of %A to include release bits.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Enforce atomic compare_exchange SEQ_CST

This patch enforces SEQ_CST for atomic compare_exchange ops.

Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/sync.md (atomic_cas_value_strong<mode>): Change
FENCE/LR.aq/SC.aq into sequentially consistent LR.aqrl/SC.rl
pair.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Enforce subword atomic LR/SC SEQ_CST

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/sync.md: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Enforce Libatomic LR/SC SEQ_CST

Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

libgcc/ChangeLog:

* config/riscv/atomic.c: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: Eliminate SYNC memory models

Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().

2023-04-27 Patrick O'Neill <patrick@rivosinc.com>

gcc/ChangeLog:

* config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and
sanitize memmodel input with memmodel_base.

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

libstdc++: Regenerate baseline_symbols.txt files for Linux

The following patch regenerates the ABI files (I've only changed the
Linux files which were updated recently (last month)).

2023-05-02 Jakub Jelinek <jakub@redhat.com>

* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/m68k-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Update.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Update.

RISC-V: Name newly added flags in changelog

This patch fixes the changelog to explicitly name the added command line
flags introduced in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html

2023-05-01 Patrick O'Neill <patrick@rivosinc.com>

Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>

RISC-V: ICE for vlmul_ext_v intrinsic API

PR target/109617

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Support VNx2HI and VNX4DI when MIN_VLEN >= 128.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vlmul_ext-1.c: New test.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
Co-authored-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>

RISC-V: fix build issue with gcc 4.9.x

GCC should still build with GCC 4.8.3 or newer [1]
using C++03 by default. But a recent change in
RISC-V port introduced a C++11 feature "std::log2" [2].

Use log2 from the C header, without the namespace [3].

[1] https://gcc.gnu.org/install/prerequisites.html
[2] https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=7caa1ae5e451e780fbc4746a54e3f19d4f4304dc
[3] https://stackoverflow.com/questions/26733413/error-log2-is-not-a-member-of-std

Fixes:
https://gitlab.com/buildroot.org/toolchains-builder/-/jobs/4202276589

gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc: Use log2 from the C header, without
the namespace.

Signed-off-by: Romain Naour <romain.naour@gmail.com>

c++: Add testcase for already fixed PR [PR109506]

The PR109666 fix r14-386-g07c52d1eec967 incidentally also fixes this PR.

PR c++/109506

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template26.C: New test.

docs: port documentation of VRP params

gcc/ChangeLog:

* doc/invoke.texi: Update documentation based on param.opt file.

tree-optimization/109672 - properly check emulated plus during vect

The following refactors the check for emulated vector support for
the cases of plus, minus and negate. In the PR we end up with
a SImode plus, supported by the target but emulated and in this
context fail to verify we are dealing with exactly word_mode.

PR tree-optimization/109672
* tree-vect-stmts.cc (vectorizable_operation): For plus,
minus and negate always check the vector mode is word mode.

[i386] Fix testcases for emulated scatter

The following adjusts testcases where the pr88531 fail with -m32
because we do not consider MMX size vectorization there and the
pr89618 runs into load/store cost differences with -m32.

* gcc.target/i386/pr88531-2a.c: Skip scanning for ia32.
* gcc.target/i386/pr88531-2b.c: Likewise.
* gcc.target/i386/pr88531-2c.c: Likewise.
* gcc.target/i386/pr89618-2.c: Likewise. Disable AVX512.

ibstdc++: Shut up -Wattribute-alias warning [PR109694]

I've followed what other files do, using attribute alias with not really
matching function type (after all, it isn't really possible when it is a
constructor), but seems I've missed it warns:
../../../../../libstdc++-v3/src/c++98/ios_init.cc:203:8: warning: ‘void std::ios_base_library_init()’ alias between functions of incompatible types ‘void()’ and ‘void
+(std::ios_base::Init::)()’ [-Wattribute-alias=]
  203 |   void ios_base_library_init (void)
      |        ^~~~~~~~~~~~~~~~~~~~~
../../../../../libstdc++-v3/src/c++98/ios_init.cc:78:3: note: aliased declaration here
   78 |   ios_base::Init::Init()
      |   ^~~~~~~~
The PR talks about clang++ warning there (which I think isn't really
supported, libstdc++ sources ought to be built by GCC), but it warns
when built with GCC too.

The following patch fixes it by doing what other libstdc++ sources do in
those cases.

2023-05-02  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/109694
* src/c++98/ios_init.cc: Add #pragma GCC diagnostic ignored for
-Wattribute-alias.

Daily bump.

ubsan: ubsan_maybe_instrument_array_ref tweak

In <https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613687.html>
we discussed that the copy_node in ubsan_maybe_instrument_array_ref
is redundant, but also that it'd be best to postpone the optimization
to GCC 14. So I'm making that change now.

gcc/c-family/ChangeLog:

* c-ubsan.cc (ubsan_maybe_instrument_array_ref): Don't copy_node.

c++: array DMI and member fn [PR109666]

Here it turns out I also needed to adjust cfun when stepping out of the
member function to instantiate the DMI. But instead of adding that tweak,
let's unify with instantiate_body and just push_to_top_level instead of
trying to do the minimum subset of it. There was no measurable change in
compile time on stdc++.h.

This should also resolve 109506 without yet another tweak.

PR c++/109666

gcc/cp/ChangeLog:

* name-lookup.cc (maybe_push_to_top_level)
(maybe_pop_from_top_level): Split out...
* pt.cc (instantiate_body): ...from here.
* init.cc (maybe_instantiate_nsdmi_init): Use them.
* name-lookup.h: Declare them..

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-array2.C: New test.

PHIOPT: Update comment about what the pass now does

I noticed I didn't update the comment about how the pass
works after I initially added match_simplify_replacement.
Anyways this updates the comment to be the current state
of the pass.

OK?

gcc/ChangeLog:

* tree-ssa-phiopt.cc: Update comment about
how the transformation are implemented.

Convert xstormy16 to LRA

This patch converts the xstormy16 patch to LRA. It introduces a code
quality regression in the shiftsi testcase, but it also fixes numerous
aborts/errors. IMHO it's a good tradeoff.

gcc/

* config/stormy16/stormy16.cc (TARGET_LRA_P): Remove defintion.

Enable LRA on several ports

Spurred by Segher's RFC, I went ahead and tested several ports with LRA
enabled.  Not surprisingly, many failed, but a few built their full set
of libraries successful and of those a few even ran their testsuites
with no regressions.  In fact, enabling LRA fixes a small number of
failures on the iq2000 port.

This patch converts the ports which built their libraries and have test
results that are as good as or better than without LRA.    There may
be minor code quality regressions or there may be minor code quality
improvements -- I'm leaving that for the port maintainers to own going
forward.

gcc/

* config/cris/cris.cc (TARGET_LRA_P): Remove.
* config/epiphany/epiphany.cc (TARGET_LRA_P): Remove.
* config/iq2000/iq2000.cc (TARGET_LRA_P): Remove.
* config/m32r/m32r.cc (TARGET_LRA_P): Remove.
* config/microblaze/microblaze.cc (TARGET_LRA_P): Remove.
* config/mmix/mmix.cc (TARGET_LRA_P): Remove.

apply debug-remap to file names in .su files

The .su files generated with -fstack-usage are arguably debug info. In
order to make builds more reproducible, apply the same remapping logic
to the recorded file names as for when producing the debug info
embedded in the object files.

To this end, teach print_decl_identifier() a new
PRINT_DECL_REMAP_DEBUG flag and use that from output_stack_usage_1().

gcc/ChangeLog:

* print-tree.h (PRINT_DECL_REMAP_DEBUG): New flag.
* print-tree.cc (print_decl_identifier): Implement it.
* toplev.cc (output_stack_usage_1): Use it.

libgcc pru: Define TARGET_HAS_NO_HW_DIVIDE

This patch aligns the configuration to the actual PRU capabilities. It
also reduces the size of the affected libgcc functions.

For a real-world project using integer arithmetics the savings
are significant:

  Before:
     text    data     bss     dec     hex filename
     3688     865     544    5097    13e9 hc-sr04-range-sensor.elf

  With TARGET_HAS_NO_HW_DIVIDE defined:
     text    data     bss     dec     hex filename
     2824     865     544    4233    1089 hc-sr04-range-sensor.elf

Execution speed also appears to have improved. The moddi3 function is
now executed in half the CPU cycles.

libgcc/ChangeLog:

* config/pru/t-pru (HOST_LIBGCC2_CFLAGS): Add
-DTARGET_HAS_NO_HW_DIVIDE.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

Remove unused friends in int_range<>.

gcc/ChangeLog:

* value-range.h (class int_range): Remove gt_ggc_mx and gt_pch_nx
friends.

Inline irange::set_nonzero.

irange::set_nonzero is used everywhere and benefits immensely from
inlining.

gcc/ChangeLog:

* value-range.h (irange::set_nonzero): Inline.

Cleanup irange::set.

Now that anti-ranges are no more and iranges contain wide_ints instead
of trees, various cleanups are possible. This is one of a handful of
patches improving the performance of irange::set() which is not on a
hot path, but quite sensitive because it is so pervasive.

gcc/ChangeLog:

* gimple-range-op.cc (cfn_ffs::fold_range): Use the correct
precision.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Use <2> for
invalid_range, as it is an inverse range.
* tree-vrp.cc (find_case_label_range): Avoid trees.
* value-range.cc (irange::irange_set): Delete.
(irange::irange_set_1bit_anti_range): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Cleanup.
* value-range.h (class irange): Remove irange_set,
irange_set_anti_range, irange_set_1bit_anti_range.
(irange::set_undefined): Remove set to m_type.

Convert internal representation of irange to wide_ints.

gcc/ChangeLog:

* range-op.cc (update_known_bitmask): Adjust for irange containing
wide_ints internally.
* tree-ssanames.cc (set_nonzero_bits): Same.
* tree-ssanames.h (set_nonzero_bits): Same.
* value-range-storage.cc (irange_storage::set_irange): Same.
(irange_storage::get_irange): Same.
* value-range.cc (irange::operator=): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::contains_p): Same.
(irange::irange_single_pair_union): Same.
(irange::union_): Same.
(irange::irange_contains_p): Same.
(irange::intersect): Same.
(irange::invert): Same.
(irange::set_range_from_nonzero_bits): Same.
(irange::set_nonzero_bits): Same.
(mask_to_wi): Same.
(irange::intersect_nonzero_bits): Same.
(irange::union_nonzero_bits): Same.
(gt_ggc_mx): Same.
(gt_pch_nx): Same.
(tree_range): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
* value-range.h (irange::type): Same.
(irange::varying_compatible_p): Same.
(irange::irange): Same.
(int_range::int_range): Same.
(irange::set_undefined): Same.
(irange::set_varying): Same.
(irange::lower_bound): Same.
(irange::upper_bound): Same.

Rewrite bounds_of_var_in_loop() to use ranges.

Little by little, bounds_of_var_in_loop() has grown into an
unmaintainable mess. This patch rewrites the code to use the relevant
APIs as well as refactor it to make it more readable.

gcc/ChangeLog:

* gimple-range-fold.cc (tree_lower_bound): Delete.
(tree_upper_bound): Delete.
(vrp_val_max): Delete.
(vrp_val_min): Delete.
(fold_using_range::range_of_ssa_name_with_loop_info): Call
range_of_var_in_loop.
* vr-values.cc (valid_value_p): Delete.
(fix_overflow): Delete.
(get_scev_info): New.
(bounds_of_var_in_loop): Refactor into...
(induction_variable_may_overflow_p): ...this,
(range_from_loop_direction): ...and this,
(range_of_var_in_loop): ...and this.
* vr-values.h (bounds_of_var_in_loop): Delete.
(range_of_var_in_loop): New.

Replace vrp_val* with wide_ints.

This patch removes all uses of vrp_val_{min,max} in favor for a
irange_val_* which are wide_int based.  This will leave only one use
of vrp_val_* which returns trees in range_of_ssa_name_with_loop_info()
because it needs to work with non-integers (floats, etc).  In a
follow-up patch, this function will also be cleaned up such that
vrp_val_* can be deleted.

The functions min_limit and max_limit in range-op.cc are now useless
as they're basically irange_val*.  I didn't rename them yet to avoid
churn.  I'll do it in a later patch.

gcc/ChangeLog:

* gimple-range-fold.cc (adjust_pointer_diff_expr): Rewrite with
irange_val*.
(vrp_val_max): New.
(vrp_val_min): New.
* gimple-range-op.cc (cfn_strlen::fold_range): Use irange_val_*.
* range-op.cc (max_limit): Same.
(min_limit): Same.
(plus_minus_ranges): Same.
(operator_rshift::op1_range): Same.
(operator_cast::inside_domain_p): Same.
* value-range.cc (vrp_val_is_max): Delete.
(vrp_val_is_min): Delete.
(range_tests_misc): Use irange_val_*.
* value-range.h (vrp_val_is_min): Delete.
(vrp_val_is_max): Delete.
(vrp_val_max): Delete.
(irange_val_min): New.
(vrp_val_min): Delete.
(irange_val_max): New.
* vr-values.cc (check_for_binary_op_overflow): Use irange_val_*.

Conversion to irange wide_int API.

This converts the irange API to use wide_ints exclusively, along with
its users.

This patch will slow down VRP, as there will be more useless
wide_int to tree conversions. However, this slowdown is only
temporary, as a follow-up patch will convert the internal
representation of iranges to wide_ints for a net overall gain
in performance.

gcc/ChangeLog:

* fold-const.cc (expr_not_equal_to): Convert to irange wide_int API.
* gimple-fold.cc (size_must_be_zero_p): Same.
* gimple-loop-versioning.cc
(loop_versioning::prune_loop_conditions): Same.
* gimple-range-edge.cc (gcond_edge_range): Same.
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-fold.cc (adjust_imagpart_expr): Same.
(adjust_realpart_expr): Same.
(fold_using_range::range_of_address): Same.
(fold_using_range::relation_fold_and_or): Same.
* gimple-range-gori.cc (gori_compute::gori_compute): Same.
(range_is_either_true_or_false): Same.
* gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same.
(cfn_clz::fold_range): Same.
(cfn_ctz::fold_range): Same.
* gimple-range-tests.cc (class test_expr_eval): Same.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Same.
* ipa-cp.cc (ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
(decide_whether_version_node): Same.
* ipa-prop.cc (ipa_get_value_range): Same.
* ipa-prop.h (ipa_range_set_and_normalize): Same.
* range-op.cc (get_shift_range): Same.
(value_range_from_overflowed_bounds): Same.
(value_range_with_overflow): Same.
(create_possibly_reversed_range): Same.
(equal_op1_op2_relation): Same.
(not_equal_op1_op2_relation): Same.
(lt_op1_op2_relation): Same.
(le_op1_op2_relation): Same.
(gt_op1_op2_relation): Same.
(ge_op1_op2_relation): Same.
(operator_mult::op1_range): Same.
(operator_exact_divide::op1_range): Same.
(operator_lshift::op1_range): Same.
(operator_rshift::op1_range): Same.
(operator_cast::op1_range): Same.
(operator_logical_and::fold_range): Same.
(set_nonzero_range_from_mask): Same.
(operator_bitwise_or::op1_range): Same.
(operator_bitwise_xor::op1_range): Same.
(operator_addr_expr::fold_range): Same.
(pointer_plus_operator::wi_fold): Same.
(pointer_or_operator::op1_range): Same.
(INT): Same.
(UINT): Same.
(INT16): Same.
(UINT16): Same.
(SCHAR): Same.
(UCHAR): Same.
(range_op_cast_tests): Same.
(range_op_lshift_tests): Same.
(range_op_rshift_tests): Same.
(range_op_bitwise_and_tests): Same.
(range_relational_tests): Same.
* range.cc (range_zero): Same.
(range_nonzero): Same.
* range.h (range_true): Same.
(range_false): Same.
(range_true_and_false): Same.
* tree-data-ref.cc (split_constant_offset_1): Same.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same.
* tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same.
(find_unswitching_predicates_for_bb): Same.
* tree-ssa-phiopt.cc (value_replacement): Same.
* tree-ssa-threadbackward.cc
(back_threader::find_taken_edge_cond): Same.
* tree-ssanames.cc (ssa_name_has_boolean_range): Same.
* tree-vrp.cc (find_case_label_range): Same.
* value-query.cc (range_query::get_tree_range): Same.
* value-range.cc (irange::set_nonnegative): Same.
(frange::contains_p): Same.
(frange::singleton_p): Same.
(frange::internal_singleton_p): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::operator==): Same.
(irange::singleton_p): Same.
(irange::contains_p): Same.
(irange::set_range_from_nonzero_bits): Same.
(DEFINE_INT_RANGE_INSTANCE): Same.
(INT): Same.
(UINT): Same.
(SCHAR): Same.
(UINT128): Same.
(UCHAR): Same.
(range): New.
(tree_range): New.
(range_int): New.
(range_uint): New.
(range_uint128): New.
(range_uchar): New.
(range_char): New.
(build_range3): Convert to irange wide_int API.
(range_tests_irange3): Same.
(range_tests_int_range_max): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
(range_tests_nan): Same.
(range_tests_signed_zeros): Same.
* value-range.h (Value_Range::Value_Range): Same.
(irange::set): Same.
(irange::nonzero_p): Same.
(irange::contains_p): Same.
(range_includes_zero_p): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
(contains_zero_p): Same.
(frange::contains_p): Same.
* vr-values.cc
(simplify_using_ranges::op_with_boolean_value_range_p): Same.
(bounds_of_var_in_loop): Same.
(simplify_using_ranges::legacy_fold_cond_overflow): Same.

Merge irange::union/intersect into irange_union/intersect.

gcc/ChangeLog:

* value-range.cc (irange::irange_union): Rename to...
(irange::union_): ...this.
(irange::irange_intersect): Rename to...
(irange::intersect): ...this.
* value-range.h (irange::union_): Delete.
(irange::intersect): Delete.

Convert get_legacy_range in bounds_of_var_in_loop to irange API.

gcc/ChangeLog:

* vr-values.cc (bounds_of_var_in_loop): Convert to irange API.

Various cleanups in vr-values.cc towards ranger API.

gcc/ChangeLog:

* vr-values.cc (check_for_binary_op_overflow): Tidy up by using
ranger API.
(compare_ranges): Delete.
(compare_range_with_value): Delete.
(bounds_of_var_in_loop): Tidy up by using ranger API.
(simplify_using_ranges::fold_cond_with_ops): Cleanup and rename
from vrp_evaluate_conditional_warnv_with_ops_using_ranges.
(simplify_using_ranges::legacy_fold_cond_overflow): Remove
strict_overflow_p and only_ranges.
(simplify_using_ranges::legacy_fold_cond): Adjust call to
legacy_fold_cond_overflow.
(simplify_using_ranges::simplify_abs_using_ranges): Adjust for
rename.
(range_fits_type_p): Rename value_range to irange.
* vr-values.h (range_fits_type_p): Adjust prototype.

Remove irange::tree_{lower,upper}_bound.

gcc/ChangeLog:

* value-range.cc (irange::irange_set_anti_range): Remove uses of
tree_lower_bound and tree_upper_bound.
(irange::verify_range): Same.
(irange::operator==): Same.
(irange::singleton_p): Same.
* value-range.h (irange::tree_lower_bound): Delete.
(irange::tree_upper_bound): Delete.
(irange::lower_bound): Delete.
(irange::upper_bound): Delete.
(irange::zero_p): Remove uses of tree_lower_bound and
tree_upper_bound.

Remove irange::{min,max,kind}.

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (refine_value_range_using_guard): Remove
kind() call.
(determine_value_range): Same.
(record_nonwrapping_iv): Same.
(infer_loop_bounds_from_signedness): Same.
(scev_var_range_cant_overflow): Same.
* tree-vrp.cc (operand_less_p): Delete.
* tree-vrp.h (operand_less_p): Delete.
* value-range.cc (get_legacy_range): Remove uses of deprecated API.
(irange::value_inside_range): Delete.
* value-range.h (vrange::kind): Delete.
(irange::num_pairs): Remove check of m_kind.
(irange::min): Delete.
(irange::max): Delete.

vrange_storage overhaul

[tl;dr: This is a rewrite of value-range-storage.* such that global
ranges and the internal ranger cache can use the same efficient
storage mechanism.  It is optimized such that when wide_ints are
dropped into irange, the copying back and forth from storage will be
very fast, while being able to hold any number of sub-ranges
dynamically allocated at run-time.  This replaces the global storage
mechanism which was limited to 6-subranges.]

Previously we had a vrange allocator for use in the ranger cache.  It
worked with trees and could be used in place (fast), but it was not
memory efficient.  With the upcoming switch to wide_ints for irange,
we can't afford to allocate ranges that can be used in place, because
an irange will be significantly larger, as it will hold full
wide_ints.  We need a trailing_wide_int mechanism similar to what we
use for global ranges, but fast enough to use in the ranger's cache.

The global ranges had another allocation mechanism that was
trailing_wide_int based.  It was memory efficient but slow given the
constant conversions from trees to wide_ints.

This patch gets us the best of both worlds by providing a storage
mechanism with a custom trailing wide int interface, while at the same
time being fast enough to use in the ranger cache.

We use a custom trailing wide_int mechanism but more flexible than
trailing_wide_int, since the latter has compile-time fixed-sized
wide_ints.  The original TWI structure has the current length of each
wide_int in a static portion preceeding the variable length:

template <int N>
struct GTY((user)) trailing_wide_ints
{
...
...
  /* The current length of each number.
     that will, in turn, turn off TBAA on gimple, trees and RTL.  */
  struct {unsigned char len;} m_len[N];

  /* The variable-length part of the structure, which always contains
     at least one HWI.  Element I starts at index I * M_MAX_LEN.  */
  HOST_WIDE_INT m_val[1];
};

We need both m_len[] and m_val[] to be variable-length at run-time.
In the previous incarnation of the storage mechanism the limitation of
m_len[] being static meant that we were limited to whatever [N] could
use up the unused bits in the TWI control world.  In practice this
meant we were limited to 6 sub-ranges.  This worked fine for global
ranges, but is a no go for our internal cache, where we must represent
things exactly (ranges for switches, etc).

The new implementation removes this restriction by making both m_len[]
and m_val[] variable length.  Also, rolling our own allows future
optimization be using some of the leftover bits in the control world.

Also, in preparation for the wide_int conversion, vrange_storage is
now optimized to blast the bits directly into the ultimate irange
instead of going through the irange API.  So ultimately copying back
and forth between the ranger cache and the storage mechanism is just a
matter of copying a few bits for the control word, and copying an
array of HOST_WIDE_INTs.  These changes were heavily profiled, and
yielded a good chunk of the overall speedup for the wide_int
conversion.

Finally, vrange_storage is now a first class structure with GTY
markers and all, thus alleviating the void * hack in struct
tree_ssa_name and friends.  This removes a few warts in the API and
looks cleaner overall.

gcc/ChangeLog:

* gimple-fold.cc (maybe_fold_comparisons_from_match_pd): Adjust
for vrange_storage.
* gimple-range-cache.cc (sbr_vector::sbr_vector): Same.
(sbr_vector::grow): Same.
(sbr_vector::set_bb_range): Same.
(sbr_vector::get_bb_range): Same.
(sbr_sparse_bitmap::sbr_sparse_bitmap): Same.
(sbr_sparse_bitmap::set_bb_range): Same.
(sbr_sparse_bitmap::get_bb_range): Same.
(block_range_cache::block_range_cache): Same.
(ssa_global_cache::ssa_global_cache): Same.
(ssa_global_cache::get_global_range): Same.
(ssa_global_cache::set_global_range): Same.
* gimple-range-cache.h: Same.
* gimple-range-edge.cc
(gimple_outgoing_range::gimple_outgoing_range): Same.
(gimple_outgoing_range::switch_edge_range): Same.
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-edge.h: Same.
* gimple-range-infer.cc
(infer_range_manager::infer_range_manager): Same.
(infer_range_manager::get_nonzero): Same.
(infer_range_manager::maybe_adjust_range): Same.
(infer_range_manager::add_range): Same.
* gimple-range-infer.h: Rename obstack_vrange_allocator to
vrange_allocator.
* tree-core.h (struct irange_storage_slot): Remove.
(struct tree_ssa_name): Remove irange_info and frange_info.  Make
range_info a pointer to vrange_storage.
* tree-ssanames.cc (range_info_fits_p): Adjust for vrange_storage.
(range_info_alloc): Same.
(range_info_free): Same.
(range_info_get_range): Same.
(range_info_set_range): Same.
(get_nonzero_bits): Same.
* value-query.cc (get_ssa_name_range_info): Same.
* value-range-storage.cc (class vrange_internal_alloc): New.
(class vrange_obstack_alloc): New.
(class vrange_ggc_alloc): New.
(vrange_allocator::vrange_allocator): New.
(vrange_allocator::~vrange_allocator): New.
(vrange_storage::alloc_slot): New.
(vrange_allocator::alloc): New.
(vrange_allocator::free): New.
(vrange_allocator::clone): New.
(vrange_allocator::clone_varying): New.
(vrange_allocator::clone_undefined): New.
(vrange_storage::alloc): New.
(vrange_storage::set_vrange): Remove slot argument.
(vrange_storage::get_vrange): Same.
(vrange_storage::fits_p): Same.
(vrange_storage::equal_p): New.
(irange_storage::write_lengths_address): New.
(irange_storage::lengths_address): New.
(irange_storage_slot::alloc_slot): Remove.
(irange_storage::alloc): New.
(irange_storage_slot::irange_storage_slot): Remove.
(irange_storage::irange_storage): New.
(write_wide_int): New.
(irange_storage_slot::set_irange): Remove.
(irange_storage::set_irange): New.
(read_wide_int): New.
(irange_storage_slot::get_irange): Remove.
(irange_storage::get_irange): New.
(irange_storage_slot::size): Remove.
(irange_storage::equal_p): New.
(irange_storage_slot::num_wide_ints_needed): Remove.
(irange_storage::size): New.
(irange_storage_slot::fits_p): Remove.
(irange_storage::fits_p): New.
(irange_storage_slot::dump): Remove.
(irange_storage::dump): New.
(frange_storage_slot::alloc_slot): Remove.
(frange_storage::alloc): New.
(frange_storage_slot::set_frange): Remove.
(frange_storage::set_frange): New.
(frange_storage_slot::get_frange): Remove.
(frange_storage::get_frange): New.
(frange_storage_slot::fits_p): Remove.
(frange_storage::equal_p): New.
(frange_storage::fits_p): New.
(ggc_vrange_allocator): New.
(ggc_alloc_vrange_storage): New.
* value-range-storage.h (class vrange_storage): Rewrite.
(class irange_storage): Rewrite.
(class frange_storage): Rewrite.
(class obstack_vrange_allocator): Remove.
(class ggc_vrange_allocator): Remove.
(vrange_allocator::alloc_vrange): Remove.
(vrange_allocator::alloc_irange): Remove.
(vrange_allocator::alloc_frange): Remove.
(ggc_alloc_vrange_storage): New.
* value-range.h (class irange): Rename vrange_allocator to
irange_storage.
(class frange): Same.

Daily bump.

Revert "[PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__"

This reverts commit e7ce7c4905fd254760b1cd187752a03bc0c148ba.

[Committed] Update xstormy16's neghi2 pattern to not clobber the carry flag.

When I converted xstormy's neghi2 pattern from a define_expand to a
define_insn, I forgot that define_expand implicitly produces a
sequence of instructions, but a define_insn is an implicit parallel,
thereby messing up the clobber (reg:BI CARRY_REG), which can then cause
an ICE in the auto-generated added_clobbers_hard_reg_p. Whilst stripping
the superfluous PARALLEL resolves this issue, an even better fix is to
use xstormy16's INC instruction, that (like NOT) doesn't affect the carry
flag, resulting in a neghi2 implementation that can more easily be CSE'd
and scheduled.

Many thanks (again) to Jeff Law for testing/reporting this issue.

2024-04-30 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (neghi2): Rewrite pattern using
inc to avoid clobbering the carry flag.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/neghi2.c: Update expected implementation.

Improve error message for excess elements in array initializer from {"a"}

So char arrays are not the only type that be initialized from {"a"}.
We can have wchar_t (L"") and char16_t (u"") types too. So let's
print out the type of the array instead of just saying char.

Note in the testsuite I used regex . to match '[' and ']' as
I could not figure out how many '\' I needed.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/c/ChangeLog:

* c-typeck.cc (process_init_element): Print out array type
for excessive elements.

gcc/testsuite/ChangeLog:

* gcc.dg/init-bad-1.c: Update error message.
* gcc.dg/init-bad-2.c: Likewise.
* gcc.dg/init-bad-3.c: Likewise.
* gcc.dg/init-excess-3.c: Likewise.
* gcc.dg/pr61096-1.c: Likewise.

Fix C/107926: Wrong error message when initializing char array

The problem here is the code which handles {"a"} is supposed
to handle the case where the is something after the string but
it only handles the case where there is another string so
we go down the other path and error out saying "excess elements
in struct initializer" even though this was a character array.
To fix this, we need to move the ckeck if the initializer is
a string after the check for array and initializer.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Adnrew Pinski

gcc/c/ChangeLog:

PR c/107926
* c-typeck.cc (process_init_element): Move the check
for string cst until after the error message.

gcc/testsuite/ChangeLog:

PR c/107926
* gcc.dg/init-excess-3.c: New test.

MATCH: add some of what phiopt's builtin_zero_pattern did

This adds the patterns for
POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
For "a != 0 ? FUNC(a) : CST".
CLRSB, CLRSBL, and CLRSBLL will be moved next.

Note this is not enough to remove
cond_removal_in_builtin_zero_pattern as we need to handle
the case where there is an NOP_CONVERT inside the conditional
to move out of the condition inside match_simplify_replacement.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Add patterns for "a != 0 ? FUNC(a) : CST"
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.

PHIOPT: Allow moving of some builtin calls

While moving working on moving
cond_removal_in_builtin_zero_pattern to match, I noticed
that functions were not allowed to move as we reject all
non-assignments.
This changes to allowing a few calls which are known not
to throw/trap. Right now it is restricted to ones
which cond_removal_in_builtin_zero_pattern handles but
adding more is just adding it to the switch statement.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
Allow some builtin/internal function calls which
are known not to trap/throw.
(phiopt_worker::match_simplify_replacement):
Use name instead of getting the lhs again.

hwasan: adjust wording in expected output in tests

gcc/testsuite/ChangeLog:

* c-c++-common/hwasan/asan-pr70541.c: Adjust wording of expected
output.
* c-c++-common/hwasan/heap-overflow.c: Likewise.
* c-c++-common/hwasan/sanity-check-pure-c.c: Likewise.
* c-c++-common/hwasan/use-after-free.c: Likewise.

libsanitizer: link hwasan against lsan library

Similarly to libasan.so, libhwasan.so also utilizes some
of the symbols from lsan library.

PR sanitizer/109674

libsanitizer/ChangeLog:

* hwasan/Makefile.am: Depend on liblsan.
* hwasan/Makefile.in: Re-generate.

[PATCH] libcpp: suppress builtin macro redefined warnings for __LINE__

From 0821df518b264e754d698d399f98be1a62945e32 Mon Sep 17 00:00:00 2001
From: Longjun Luo <luolongjuna@gmail.com>
Date: Thu, 12 Jan 2023 23:59:54 +0800
Subject: [PATCH] libcpp: suppress builtin macro redefined warnings for
__LINE__

As implied in
gcc.gnu.org/legacy-ml/gcc-patches/2008-09/msg00076.html,
gcc provides -Wno-builtin-macro-redefined to suppress warning when
redefining builtin macro. However, at that time, there was no
scenario for __LINE__ macro.

But, when we try to build a live-patch, we compare sections by using
-ffunction-sections. Some same functions are considered changed because
of __LINE__ macro.

At present, to detect such a changed caused by __LINE__ macro, we
have to analyse code and maintain a function list. For example,
in kpatch, check this commit
github.com/dynup/kpatch/commit/0e1b95edeafa36edb7bcf11da6d1c00f76d7e03d.

So, in this scenario, when we try to compared sections, it would
be better to support suppress builtin macro redefined warnings for
__LINE__ macro.

libcpp:
* init.cc (builtin_array): Do not always warn for a redefinition
of __LINE__.

gcc/testsuite

* gcc.dg/builtin-redefine.c: Test for redefintion warnings
for __LINE__.
* gcc.dg/builtin-redefine-1.c: New test.

gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING

Fall back to ld -r if ld -shared fails during configure. The check for
HAVE_LD_RO_RW_SECTION_MIXING can fail on targets where ld does not
support shared objects, even though the answer to the test should be
'read-write'. One such target is riscv64-unknown-elf. Failing this test
results in a libgcc crtbegin.o which has a writable .eh_frame section
leading to the default linker scripts placing the .eh_frame section in a
writable memory segment, or a linker warning when using ld scripts that
place .eh_frame unconditionally in ROM.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING

libsanitizer: update LOCAL_PATCHES revision

libsanitizer/ChangeLog:

* LOCAL_PATCHES: Update revision.

libsanitizer: Apply local patches

libsanitizer: merge from upstream (87e6e490e79384a5)

Remove duplicate constants created between passes

There is no need to re-create constant literals between passes.
This patch creates a constant pool and reuses a constant literal
providing it is created at the same location. This in turn avoids
generating duplicate overflow error messages when encountering an
out of range constant literal.

gcc/m2/ChangeLog:

* gm2-compiler/SymbolTable.mod (ConstLitPoolEntry): New
pointer to record.
(ConstLitSym): New field RangeError.
(ConstLitPoolTree): New SymbolTree representing name to
index.
(ConstLitArray): New dynamic array containing pointers
to a ConstLitPoolEntry.
(CreateConstLit): New procedure function.
(LookupConstLitPoolEntry): New procedure function.
(AddConstLitPoolEntry): New procedure function.
(MakeConstLit): Re-implemented to check the constant lit
pool before calling CreateConstLit.
* m2.flex: Add ability to decode binary constant literals.

gcc/testsuite/ChangeLog:

* gm2/pim/run/pass/constlitbase.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

reload: Handle generating reloads that also clobbers flags

* reload1.cc (emit_insn_if_valid_for_reload_1): Rename from
emit_insn_if_valid_for_reload.
(emit_insn_if_valid_for_reload): Call new helper, and if a SET fails
to be recognized, also try emitting a parallel that clobbers
TARGET_FLAGS_REGNUM, as applicable.

[xstormy16] Efficient HImode rotate left by a single bit.

This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.

unsigned short foo(unsigned short x)
{
  return (x << 1) | (x >> 15);
}

currently with -O2 generates:
foo:    mov r7,r2
        shr r7,#15
        shl r2,#1
        or r2,r7
        ret

with this patch, GCC now generates:
foo: shl r2,#1 | adc r2,#0
        ret

Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
to a define_insn.
(*rotatehi_1): New define_insn for efficient 2 insn sequence.
(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/neghi2.c: New test case.
* gcc.target/xstormy16/rotatehi-1.c: Likewise.

[xstormy16] Recognize/support swpn (swap nibbles) instruction.

This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
  return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:    mov r7,r2
        asr r2,#4
        and r2,#15
        mov.w r6,#-256
        and r6,r7
        or r2,r6
        shl r7,#4
        and r7,#255
        or r2,r7
        ret

with this patch, we now generate:
foo: swpn r2
ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.  The naming of the new code iterators is taken from i386.md.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/stormy16/stormy16.md (any_lshift): New code iterator.
(any_or_plus): Likewise.
(any_rotate): Likewise.
(*<any_lshift>_and_internal): New define_insn_and_split to
recognize a logical shift followed by an AND, and split it
again after reload.
(*swpn): New define_insn matching xstormy16's swpn.
(*swpn_zext): New define_insn recognizing swpn followed by
zero_extendqihi2, i.e. with the high byte set to zero.
(*swpn_sext): Likewise, for swpn followed by cbw.
(*swpn_sext_2): Likewise, for an alternate RTL form.
(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
sequence is split in the correct place to recognize the *swpn_zext
followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
* gcc.target/xstormy16/swpn-1.c: New QImode test case.
* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
* gcc.target/xstormy16/swpn-4.c: New HImode test case.

add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

PR target/105525 is a build regression for the vax and lm32 linux
targets present in gcc-12/13/head, where the builds fail due to
unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
caused by these two targets failing to provide glibc-stdint.h.

Fixed thusly, tested by building crosses, which now succeeds.

Ok for trunk? (Note I don't have commit rights.)

PR target/105525
gcc/
* config.gcc (vax-*-linux*): Add glibc-stdint.h.
(lm32-*-uclinux*): Likewise.

Adjust mips test for recent ifcvt costing changes

MIPS ports have been failing a few tests since the change to add cost
checks in another path through the if-converter pass.

As with the other ports, these look like cases where we don't do good
costing in the MIPS port.  Someone who cares about MIPS will need to
fix this properly.

In the mean time this patch adjusts the branch cost when running the
two affected tests and skips them at -Os.  This is enough to verify
that if conversion can still happen if the costs are adjusted.

gcc/testsuite
* gcc.target/mips/mips-ps-type-2.c: Adjust branch cost to
encourage if-conversion.  Skip for -Os.
* gcc.target/mips/movcc-3.c: Similarly.

RISC-V: decouple stack allocation for rv32e w/o save-restore

Currently in rv32e, stack allocation for GPR callee-saved registers is
always 12 bytes w/o save-restore. Actually, for the case without save-restore,
less stack memory can be reserved. This patch decouples stack allocation for
rv32e w/o save-restore and makes riscv_compute_frame_info more readable.

output of testcase rv32e_stack.c
before patch:
addi sp,sp,-16
sw ra,12(sp)
call getInt
sw a0,0(sp)
lw a0,0(sp)
call PrintInts
lw a5,0(sp)
mv a0,a5
lw ra,12(sp)
addi sp,sp,16
jr ra

after patch:
addi sp,sp,-8
sw ra,4(sp)
call getInt
sw a0,0(sp)
lw a0,0(sp)
call PrintInts
lw a5,0(sp)
mv a0,a5
lw ra,4(sp)
addi sp,sp,8
jr ra

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_avoid_save_libcall): helper function
for riscv_use_save_libcall.
(riscv_use_save_libcall): call riscv_avoid_save_libcall.
(riscv_compute_frame_info): restructure to decouple stack allocation
for rv32e w/o save-restore.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_stack.c: New test.

Daily bump.

testsuite: Handle empty assembly lines in check-function-bodies

I tried to make use of check-function-bodies for cris-elf and was a
bit surprised to see it failing.  There's a deliberate empty line
after the filled delay slot of the return-function which was
mishandled.  I thought "aha" and tried to add an empty line
(containing just a "**" prefix) to the match, but that didn't help.
While it was added as input from the function's assembly output
to-be-matched like any other line, it couldn't be matched: I had to
use "...", which works but is...distracting.

Some digging shows that an empty assembly line can't be deliberately
matched because all matcher lines (lines starting with the prefix,
the ubiquitous "**") are canonicalized by trimming leading
whitespace (the "string trim" in check-function-bodies) and instead
adding a leading TAB character, thus empty lines end up containing
just a TAB.  For usability it's better to treat empty lines as fluff
than to uglifying the test-case and the code to properly match them.
Double-checking, no test-case tries to match an line containing just
TAB (by providing an a line containing just "**\s*", i.e. zero or
more whitespace characters).

* lib/scanasm.exp (parse_function_bodies): Set fluff to include
empty lines (besides optionally leading whitespace).

Fix autoprofiledbootstrap build

1. Fix gcov version
2. Merge perf data collected when compiling the compiler and runtime libraries
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Define PROFILE_MERGER
* Makefile.tpl: Define PROFILE_MERGER

gcc/c/ChangeLog:

* Make-lang.in: Merge perf data collected when compiling cc1 and runtime libraries

gcc/cp/ChangeLog:

* Make-lang.in: Merge perf data collected when compiling cc1plus and runtime libraries

gcc/lto/ChangeLog:

* Make-lang.in: Merge perf data collected when compiling lto1 and runtime libraries

gcc/ChangeLog:

* doc/install.texi: Fix documentation typo

RISC-V: Add divmod expansion support

Hi all,
If we have division and remainder calculations with the same operands:

  a = b / c;
  d = b % c;

We can replace the calculation of remainder with multiplication +
subtraction, using the result from the previous division:

  a = b / c;
  d = a * c;
  d = b - d;

Which will be faster.
Currently, it isn't done for RISC-V.

I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'.

Best regards,
Matevos.

gcc/ChangeLog:

* config/riscv/iterators.md (only_div, paired_mod): New iterators.
(u): Add div/udiv cases.
* config/riscv/riscv-protos.h (riscv_use_divmod_expander): Prototype.
* config/riscv/riscv.cc (struct riscv_tune_param): Add field for
divmod expansion.
(rocket_tune_info, sifive_7_tune_info): Initialize new field.
(thead_c906_tune_info): Likewise.
(optimize_size_tune_info): Likewise.
(riscv_use_divmod_expander): New function.
* config/riscv/riscv.md (<u>divmod<mode>4): New expander.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/divmod-1.c: New testcase.
* gcc.target/riscv/divmod-2.c: New testcase.

RISC-V: Added support clmul[r,h] instructions for Zbc extension.

clmul[h] instructions were added only for the ZBKC extension.
This patch includes them in the ZBC extension too.
Besides, added support of 'clmulr' instructions for ZBC extension.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Added clmulr instruction.
* config/riscv/riscv-builtins.cc (AVAIL): Add new.
* config/riscv/riscv.md: (UNSPEC_CLMULR): Add new unspec type.
(type): Add clmul
* config/riscv/riscv-cmo.def: Added built-in function for clmulr.
* config/riscv/crypto.md: Move clmul[h] instructions to bitmanip.md.
* config/riscv/riscv-scalar-crypto.def: Move clmul[h] built-in
functions to riscv-cmo.def.
* config/riscv/generic.md: Add clmul to list of instructions
using the generic_imul reservation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbc32.c: New test.
* gcc.target/riscv/zbc64.c: New test.

RISC-V: Eliminate redundant zero extension of minu/maxu operands

RV64 the following code:

  unsigned Min(unsigned a, unsigned b) {
      return a < b ? a : b;
  }

Compiles to:
  Min:
       zext.w  a1,a1
       zext.w  a0,a0
       minu    a0,a1,a0
       sext.w  a0,a0
       ret

This patch removes unnecessary zero extensions of minu/maxu operands.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Added expanders for minu/maxu instructions

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: Updated scanning check.
* gcc.target/riscv/zbb-min-max-03.c: New tests.

contrib: port doxygen script to Python3

contrib/ChangeLog:

* filter_gcc_for_doxygen: Use python3 and not python2.
* filter_params.py: Likewise.

PHIOPT: Move two_value_replacement to match.pd

This patch converts two_value_replacement function
into a match.pd pattern.
It is a direct translation with only one minor change,
does not check for the {0,+-1} case as that is handled
before in match.pd so there is no reason to do the extra
check for it.

OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.

gcc/ChangeLog:

PR tree-optimization/100958
* tree-ssa-phiopt.cc (two_value_replacement): Remove.
(pass_phiopt::execute): Don't call two_value_replacement.
* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
handle what two_value_replacement did.

MATCH: Add patterns from phiopt's minmax_replacement

This adds a few patterns from phiopt's minmax_replacement
for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> .
It is progress to remove minmax_replacement from phiopt.
There are still some more cases dealing with constants on the
edges (0/INT_MAX) to handle in match.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd: Add patterns for
"(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>".

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly.
* gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert
as that now does the combining.

MATCH: Factor out code that for min max detection with constants

This factors out some of the code from the min/max detection
from match.pd into a function so it can be reused in other
places. This is mainly used to detect the conversions
of >= to > which causes the integer values to be changed by
one.

Changes since v1:
* factor out the checks for INTEGER_CSTs so it is more obvious.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Factor out the deciding the min/max from
the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
pattern to ...
* fold-const.cc (minmax_from_comparison): this new function.
* fold-const.h (minmax_from_comparison): New prototype.

PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

This patch fixes PR rtl-optimization/109476, which is a code quality
regression affecting AVR.  The cause is that the lower-subreg pass is
sometimes overly aggressive, lowering the LSHIFTRT below:

(insn 7 4 8 2 (set (reg:HI 51)
        (lshiftrt:HI (reg/v:HI 49 [ b ])
            (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
     (nil))

into a pair of QImode SUBREG assignments:

(insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
        (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))
(insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
        (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
     (nil))

but this idiom, SETs of SUBREGs, interferes with combine's ability
to associate/fuse instructions.  The solution, on targets that
have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
is to split/lower LSHIFTRT to a ZERO_EXTEND.

To answer Richard's question in comment #10 of the bugzilla PR,
the function resolve_shift_zext is called with one of four RTX
codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
LSHIFTRT can the setting of low_part and high_part SUBREGs be
replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
extension, so don't set the high_part to zero; if we're splitting
a ZERO_EXTEND then it doesn't make sense to replace it with a
ZERO_EXTEND, and for ASHIFT we've played games to swap the
high_part and low_part SUBREGs, so that we assign the low_part
to zero (for double word shifts by greater than word size bits).

2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR rtl-optimization/109476
* lower-subreg.cc: Include explow.h for force_reg.
(find_decomposable_shift_zext): Pass an additional SPEED_P argument.
If decomposing a suitable LSHIFTRT and we're not splitting
ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
instead of setting a high part SUBREG to zero, which helps combine.
(decompose_multiword_subregs): Update call to resolve_shift_zext.

gcc/testsuite/ChangeLog
PR rtl-optimization/109476
* gcc.target/avr/mmcu/pr109476.c: New test case.

Synchronize include/ctf.h with upstream binutils/libctf.

This patch updates include/ctf.h to match the current libctf version in
binutils' include/. I recently attempted to build a uber tree (following
some notes that are so old they used CVS) and noticed that binutils won't
build with gcc's top-level include, due to CTF_F_IDXSORTED not being
defined in ctf.h.

2023-04-28 Roger Sayle <roger@nextmovesoftware.com>

include/ChangeLog
* ctf.h: Import latest version from binutils/libctf.

Add emulated scatter capability to the vectorizer

This adds a scatter vectorization capability to the vectorizer
without target support by decomposing the offset and data vectors
and then performing scalar stores in the order of vector lanes.
This is aimed at cases where vectorizing the rest of the loop
offsets the cost of vectorizing the scatter.

The offset load is still vectorized and costed as such, but like
with emulated gather those will be turned back to scalar loads
by forwrpop.

* tree-vect-data-refs.cc (vect_analyze_data_refs): Always
consider scatters.
* tree-vect-stmts.cc (vect_model_store_cost): Pass in the
gather-scatter info and cost emulated scatters accordingly.
(get_load_store_type): Support emulated scatters.
(vectorizable_store): Likewise. Emulate them by extracting
scalar offsets and data, doing scalar stores.

* gcc.dg/vect/pr25413a.c: Un-XFAIL everywhere.
* gcc.dg/vect/vect-71.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s4113.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s491.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-vas.c: Likewise.

Adjust costing of emulated vectorized gather/scatter

Emulated gather/scatter behave similar to strided elementwise
accesses in that they need to decompose the offset vector
and construct or decompose the data vector so handle them
the same way, pessimizing the cases with may elements.

For pr88531-2c.c instead of

.L4:
        leaq    (%r15,%rcx), %rdx
        incl    %edi
        movl    16(%rdx), %r13d
        movl    24(%rdx), %r14d
        movl    (%rdx), %r10d
        movl    4(%rdx), %r9d
        movl    8(%rdx), %ebx
        movl    12(%rdx), %r11d
        movl    20(%rdx), %r12d
        vmovss  (%rax,%r14,4), %xmm2
        movl    28(%rdx), %edx
        vmovss  (%rax,%r13,4), %xmm1
        vmovss  (%rax,%r10,4), %xmm0
        vinsertps       $0x10, (%rax,%rdx,4), %xmm2, %xmm2
        vinsertps       $0x10, (%rax,%r12,4), %xmm1, %xmm1
        vinsertps       $0x10, (%rax,%r9,4), %xmm0, %xmm0
        vmovlhps        %xmm2, %xmm1, %xmm1
        vmovss  (%rax,%rbx,4), %xmm2
        vinsertps       $0x10, (%rax,%r11,4), %xmm2, %xmm2
        vmovlhps        %xmm2, %xmm0, %xmm0
        vinsertf128     $0x1, %xmm1, %ymm0, %ymm0
        vmulps  %ymm3, %ymm0, %ymm0
        vmovups %ymm0, (%r8,%rcx)
        addq    $32, %rcx
        cmpl    %esi, %edi
        jb      .L4

we now prefer

.L4:
        leaq    0(%rbp,%rdx,8), %rcx
        movl    (%rcx), %r10d
        movl    4(%rcx), %ecx
        vmovss  (%rsi,%r10,4), %xmm0
        vinsertps       $0x10, (%rsi,%rcx,4), %xmm0, %xmm0
        vmulps  %xmm1, %xmm0, %xmm0
        vmovlps %xmm0, (%rbx,%rdx,8)
        incq    %rdx
        cmpl    %edi, %edx
        jb      .L4

* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Tame down element extracts and scalar loads for gather/scatter
similar to elementwise strided accesses.

* gcc.target/i386/pr89618-2.c: New testcase.
* gcc.target/i386/pr88531-2b.c: Adjust.
* gcc.target/i386/pr88531-2c.c: Likewise.

RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

When some RVV integer compare operators act on the same vector
registers without mask. They can be simplified to VMCLR.

This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
of the simplification by adding one new define_split.

Given we have:
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24                    <- optimized to vmclr.m
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

As above, we may have one instruction eliminated and require less
vector registers.

gcc/ChangeLog:

* config/riscv/vector.md: Add new define split to perform
the simplification.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: kito-cheng <kito.cheng@sifive.com>

libstdc++: Improve doxygen docs for <random>

Add @headerfile and @since tags. Add gamma_distribution to the correct
group (poisson distributions). Add a group for the sampling
distributions and add the missing definitions of their probability
functions. Add uniform_int_distribution back to the uniform
distributions group.

libstdc++-v3/ChangeLog:

* include/bits/random.h (gamma_distribution): Add to the right
doxygen group.
(discrete_distribution, piecewise_constant_distribution)
(piecewise_linear_distribution): Create a new doxygen group and
fix the incomplete doxygen comments.
* include/bits/uniform_int_dist.h (uniform_int_distribution):
Add to doxygen group.

libstdc++: Minor fixes to doxygen comments

libstdc++-v3/ChangeLog:

* include/bits/uses_allocator.h: Add missing @file comment.
* include/bits/regex.tcc: Remove stray doxygen comments.
* include/experimental/memory_resource: Likewise.
* include/std/bit: Tweak doxygen @cond comments.
* include/std/expected: Likewise.
* include/std/numbers: Likewise.

libstdc++: Strip absolute paths from files shown in Doxygen docs

This avoids showing absolute paths from the expansion of
@srcdir@/libsupc++/ in the doxygen File List view.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes
from header paths.