git.ipfire.org Git - thirdparty/gcc.git/log

Daily bump.

Convert ipcp_vr_lattice to type agnostic framework.

This converts the lattice to store ranges in Value_Range instead of
value_range (*) to make it type agnostic, and adjust all users
accordingly.

I've been careful to make sure Value_Range never ends up on GC, since
it contains an int_range_max and can expand on-demand onto the heap.
Longer term storage for ranges should be done with vrange_storage, as
per the previous patch ("Provide an API for ipa_vr").

gcc/ChangeLog:

* ipa-cp.cc (ipcp_vr_lattice::init): Take type argument.
(ipcp_vr_lattice::print): Call dump method.
(ipcp_vr_lattice::meet_with): Adjust for m_vr being a
Value_Range.
(ipcp_vr_lattice::meet_with_1): Make argument a reference.
(ipcp_vr_lattice::set_to_bottom): Set varying for an unsupported
range.
(initialize_node_lattices): Pass type when appropriate.
(ipa_vr_operation_and_type_effects): Make type agnostic.
(ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
(evaluate_properties_for_edge): Same.
* ipa-prop.cc (ipa_vr::get_vrange): Same.
(ipcp_update_vr): Same.
* ipa-prop.h (ipa_value_range_from_jfunc): Same.
(ipa_range_set_and_normalize): Same.

testsuite: Cut down 27_io/basic_istream/.../94749.cc for simulators

The test wchar_t/94749.cc can take about 10 minutes on some
simulator/host combinations with char/94749.cc at a third of
that time. The cause is test05 which is quite heavy and
includes wrapping a 32-bit counter. Run it only for native
setups.

* testsuite/27_io/basic_istream/ignore/wchar_t/94749.cc (main)
[! SIMULATOR_TEST]: Also exclude running test05.
* testsuite/27_io/basic_istream/ignore/char/94749.cc: Ditto.

c++: Adjust conversion deduction [PR61663][DR976]

Drop the return type's reference before doing cvqual and related decays.

gcc/cp/
PR c++/61663
* pt.cc (maybe_adjust_types_for_deduction): Implement DR976.
gcc/testsuite/
* g++.dg/template/pr61663.C: New.

target/109650: Fix wrong code after cc0 -> CCmode transition.

This patch fixes a wrong-code bug in the wake of PR92729, the transition that
turned the AVR backend from cc0 to CCmode.  In cc0, the insn that uses cc0 like
a conditional branch always follows the cc0 setter, which is no more the case
with CCmode where set and use of REG_CC might be in different basic blocks.

This patch removes the machine-dependent reorg pass in avr_reorg entirely.

It is replaced by a new, AVR specific mini-pass that runs prior to split2.
Canonicalization of comparisons away from the "difficult" codes GT[U] and LE[U]
is now mostly performed by implementing TARGET_CANONICALIZE_COMPARISON.

Moreover:

* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as needed.

* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as needed.

* Conditional branches no more clobber REG_CC.

* insn output for compares looks ahead to determine the branch mode in use.
  This needs also "dead_or_set_regno_p (*, REG_CC)".

* Add RTL peepholes for decrement-and-branch detection.

* Some of the patterns like "*cmphi.zero-extend.0" lost their
  combine-ational part wit PR92729.  Restore them.

Finally, it fixes some of the many indentation glitches left over from PR92729.

gcc/
PR target/109650
PR target/92729
* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN)
(avr_out_cmp_ext): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
[ADJUST_LEN_CMP_SEXT]Handle them.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.
* config/avr/avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
(avr_out_cmp_zext): New Protos
* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
(*swapped_tst<mode>, *add.for.eqne.<mode>): New insns.
(*cbranch<mode>4): Rename to cbranch<mode>4_insn.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst<mode>.
Rework signtest-and-branch peepholes for *sbrx_branch<mode>.
(adjust_len) [add_set_ZN, cmp_zext]: New.
(QIPSI): New mode iterator.
(ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
(gelt): New code iterator.
(gelt_eqne): New code attribute.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst<mode>, *reversed_tst<mode>)
(*cmpqi_sign_extend): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.
* config/avr/avr-dimode.md (cbranch<mode>4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/constraints.md (Yxx): New constraint.

gcc/testsuite/
PR target/109650
* gcc.target/avr/torture/pr109650-1.c: New test.
* gcc.target/avr/torture/pr109650-2.c: New test.

Fortran: add Fortran 2018 IEEE_{MIN,MAX} functions

libgfortran/

* ieee/ieee_arithmetic.F90: Add IEEE_MIN_NUM, IEEE_MAX_NUM,
IEEE_MIN_NUM_MAG, and IEEE_MAX_NUM_MAG functions.

gcc/fortran/

* f95-lang.cc (gfc_init_builtin_functions): Add fmax() and
fmin() built-ins, and their variants.
* mathbuiltins.def: Add FMAX and FMIN built-ins.
* trans-intrinsic.cc (conv_intrinsic_ieee_minmax): New function.
(gfc_conv_ieee_arithmetic_function): Handle IEEE_MIN_NUM and
IEEE_MAX_NUM functions.

gcc/testsuite/
* gfortran.dg/ieee/minmax_1.f90: New test.

libatomic: x86_64: Always try ifunc

We used to skip ifunc check when CX16 is available. But now we use
CX16+AVX+Intel/AMD for the "perfect" 16b load implementation, so CX16
alone is not a sufficient reason not to use ifunc (see PR104688).

This causes a subtle and annoying issue: when GCC is built with a
higher -march= setting in CFLAGS_FOR_TARGET, ifunc is disabled and
the worst (locked) implementation of __atomic_load_16 is always used.

There seems no good way to check if the CPU is Intel or AMD from
the built-in macros (maybe we can check every known model like __skylake,
__bdver2, ..., but it will be very error-prune and require an update
whenever we add the support for a new x86 model). The best thing we can
do seems "always try ifunc" here.

libatomic/ChangeLog:

* configure.tgt: For x86_64, always set try_ifunc=yes.

testsuite: Add more allocation size tests for conjured svalues [PR110014]

This patch adds the reproducers reported in PR 110014 as test cases. The
false positives in those cases are already fixed with PR 109577.

2023-06-09 Tim Lange <mail@tim-lange.me>

PR analyzer/110014

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/realloc-pr110014.c: New tests.

analyzer: Fix allocation size false positive on conjured svalue [PR109577]

Currently, the analyzer tries to prove that the allocation size is a
multiple of the pointee's type size.  This patch reverses the behavior
to try to prove that the expression is not a multiple of the pointee's
type size.  With this change, each unhandled case should be gracefully
considered as correct.  This fixes the bug reported in PR 109577 by
Paul Eggert.

Regression-tested on Linux x86-64 with -m32 and -m64.

2023-06-09  Tim Lange  <mail@tim-lange.me>

PR analyzer/109577

gcc/analyzer/ChangeLog:

* constraint-manager.cc (class sval_finder): Visitor to find
childs in svalue trees.
(constraint_manager::sval_constrained_p): Add new function to
check whether a sval might be part of an constraint.
* constraint-manager.h: Add sval_constrained_p function.
* region-model.cc (class size_visitor): Reverse behavior to not
emit a warning on not explicitly considered cases.
(region_model::check_region_size):
Adapt to size_visitor changes.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/allocation-size-2.c: Change expected output
and add new test case.
* gcc.dg/analyzer/pr109577.c: New test.

RISC-V: Add test cases for RVV FP16 vreinterpret

This patch would like to add more tests for RVV FP16 vreinterpret, aka

vfloat16*_t <==> v{u}int16*_t.

There we allow FP16 vreinterpret in ZVFHMIN consider we have vle FP16 already.
It doesn't break anything in SPEC as there is no such vreinterpret insn.
From the user's perspective, it is reasonable to do some type convert
between vfloat16 and v{u}int16 when only ZVFHMIN is enabled.

This patch would like to add new test cases to make sure the RVV FP16
vreinterpret works well as expected.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Diito.

RISC-V: Enable select_vl for RVV auto-vectorization

Consider this following example:
void vec_add(int32_t *restrict c, int32_t *restrict a, int32_t *restrict b,
             int N) {
  for (long i = 0; i < N; i++) {
    c[i] = a[i] + b[i];
  }
}

After this patch:
vec_add:
        ble     a3,zero,.L5
.L3:
        vsetvli a5,a3,e32,m1,ta,ma
        vle32.v v2,0(a1)
        vle32.v v1,0(a2)
        vsetvli a6,zero,e32,m1,ta,ma ===> redundant vsetvl.
        slli    a4,a5,2
        vadd.vv v1,v1,v2
        sub     a3,a3,a5
        vsetvli zero,a5,e32,m1,ta,ma ===> redundant vsetvl.
        vse32.v v1,0(a0)
        add     a1,a1,a4
        add     a2,a2,a4
        add     a0,a0,a4
        bne     a3,zero,.L3
.L5:
        ret

We can get close-to-optimal codegen but with some redundant vsetvls.
This is not the big issue which will be easily addressed in RISC-V backend.

I am going to add a standalone PASS "AVL propagation" (avlprop) to addresse
such issue.

gcc/ChangeLog:

* config/riscv/autovec.md (select_vl<mode>): New pattern.
* config/riscv/riscv-protos.h (expand_select_vl): New function.
* config/riscv/riscv-v.cc (expand_select_vl): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Adapt test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: New test.

Unify MULT_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_mult_div_base): Delete.
(foperator_mult_div_base::find_range): Make static local function.
(foperator_mult): Remove. Move prototypes to range-op-mixed.h
(operator_mult::op1_range): Rename from foperator_mult.
(operator_mult::op2_range): Ditto.
(operator_mult::rv_fold): Ditto.
(float_table::float_table): Remove MULT_EXPR.
(class foperator_div): Inherit from range_operator.
(float_table::float_table): Delete.
* range-op-mixed.h (class operator_mult): Combined from integer
and float files.
* range-op.cc (float_tree_table): Delete.
(op_mult): New object.
(unified_table::unified_table): Add MULT_EXPR.
(get_op_handler): Do not check float table any longer.
(class cross_product_operator): Move to range-op-mixed.h.
(class operator_mult): Move to range-op-mixed.h.
(integral_table::integral_table): Remove MULT_EXPR.
(pointer_table::pointer_table): Remove MULT_EXPR.
* range-op.h (float_table): Remove.

Unify NEGATE_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_negate): Remove. Move prototypes
to range-op-mixed.h
(operator_negate::fold_range): Rename from foperator_negate.
(operator_negate::op1_range): Ditto.
(float_table::float_table): Remove NEGATE_EXPR.
* range-op-mixed.h (class operator_negate): Combined from integer
and float files.
* range-op.cc (op_negate): New object.
(unified_table::unified_table): Add NEGATE_EXPR.
(class operator_negate): Move to range-op-mixed.h.
(integral_table::integral_table): Remove NEGATE_EXPR.
(pointer_table::pointer_table): Remove NEGATE_EXPR.

Unify MINUS_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_minus): Remove. Move prototypes
to range-op-mixed.h
(operator_minus::fold_range): Rename from foperator_minus.
(operator_minus::op1_range): Ditto.
(operator_minus::op2_range): Ditto.
(operator_minus::rv_fold): Ditto.
(float_table::float_table): Remove MINUS_EXPR.
* range-op-mixed.h (class operator_minus): Combined from integer
and float files.
* range-op.cc (op_minus): New object.
(unified_table::unified_table): Add MINUS_EXPR.
(class operator_minus): Move to range-op-mixed.h.
(integral_table::integral_table): Remove MINUS_EXPR.
(pointer_table::pointer_table): Remove MINUS_EXPR.

Unify ABS_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_abs): Remove. Move prototypes
to range-op-mixed.h
(operator_abs::fold_range): Rename from foperator_abs.
(operator_abs::op1_range): Ditto.
(float_table::float_table): Remove ABS_EXPR.
* range-op-mixed.h (class operator_abs): Combined from integer
and float files.
* range-op.cc (op_abs): New object.
(unified_table::unified_table): Add ABS_EXPR.
(class operator_abs): Move to range-op-mixed.h.
(integral_table::integral_table): Remove ABS_EXPR.
(pointer_table::pointer_table): Remove ABS_EXPR.

Unify PLUS_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_plus): Remove. Move prototypes
to range-op-mixed.h
(operator_plus::fold_range): Rename from foperator_plus.
(operator_plus::op1_range): Ditto.
(operator_plus::op2_range): Ditto.
(operator_plus::rv_fold): Ditto.
(float_table::float_table): Remove PLUS_EXPR.
* range-op-mixed.h (class operator_plus): Combined from integer
and float files.
* range-op.cc (op_plus): New object.
(unified_table::unified_table): Add PLUS_EXPR.
(class operator_plus): Move to range-op-mixed.h.
(integral_table::integral_table): Remove PLUS_EXPR.
(pointer_table::pointer_table): Remove PLUS_EXPR.

Unify operator_cast range operator

Move the declaration of the class to the range-op-mixed header, and use it
in the new unified table.

* range-op-mixed.h (class operator_cast): Combined from integer
and float files.
* range-op.cc (op_cast): New object.
(unified_table::unified_table): Add op_cast
(class operator_cast): Move to range-op-mixed.h.
(integral_table::integral_table): Remove op_cast
(pointer_table::pointer_table): Remove op_cast.

Unify operator_cst range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (operator_cst::fold_range): New.
* range-op-mixed.h (class operator_cst): Move from integer file.
* range-op.cc (op_cst): New object.
(unified_table::unified_table): Add op_cst. Also use for REAL_CST.
(class operator_cst): Move to range-op-mixed.h.
(integral_table::integral_table): Remove op_cst.
(pointer_table::pointer_table): Remove op_cst.

Unify Identity range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_identity): Remove. Move prototypes
to range-op-mixed.h
(operator_identity::fold_range): Rename from foperator_identity.
(operator_identity::op1_range): Ditto.
(float_table::float_table): Remove fop_identity.
* range-op-mixed.h (class operator_identity): Combined from integer
and float files.
* range-op.cc (op_identity): New object.
(unified_table::unified_table): Add op_identity.
(class operator_identity): Move to range-op-mixed.h.
(integral_table::integral_table): Remove identity.
(pointer_table::pointer_table): Remove identity.

Unify GE_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_ge): Remove. Move prototypes
to range-op-mixed.h
(operator_ge::fold_range): Rename from foperator_ge.
(operator_ge::op1_range): Ditto.
(float_table::float_table): Remove GE_EXPR.
* range-op-mixed.h (class operator_ge): Combined from integer
and float files.
* range-op.cc (op_ge): New object.
(unified_table::unified_table): Add GE_EXPR.
(class operator_ge): Move to range-op-mixed.h.
(ge_op1_op2_relation): Fold into
operator_ge::op1_op2_relation.
(integral_table::integral_table): Remove GE_EXPR.
(pointer_table::pointer_table): Remove GE_EXPR.
* range-op.h (ge_op1_op2_relation): Delete.

Unify GT_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_gt): Remove. Move prototypes
to range-op-mixed.h
(operator_gt::fold_range): Rename from foperator_gt.
(operator_gt::op1_range): Ditto.
(float_table::float_table): Remove GT_EXPR.
* range-op-mixed.h (class operator_gt): Combined from integer
and float files.
* range-op.cc (op_gt): New object.
(unified_table::unified_table): Add GT_EXPR.
(class operator_gt): Move to range-op-mixed.h.
(gt_op1_op2_relation): Fold into
operator_gt::op1_op2_relation.
(integral_table::integral_table): Remove GT_EXPR.
(pointer_table::pointer_table): Remove GT_EXPR.
* range-op.h (gt_op1_op2_relation): Delete.

Unify LE_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_le): Remove. Move prototypes
to range-op-mixed.h
(operator_le::fold_range): Rename from foperator_le.
(operator_le::op1_range): Ditto.
(float_table::float_table): Remove LE_EXPR.
* range-op-mixed.h (class operator_le): Combined from integer
and float files.
* range-op.cc (op_le): New object.
(unified_table::unified_table): Add LE_EXPR.
(class operator_le): Move to range-op-mixed.h.
(le_op1_op2_relation): Fold into
operator_le::op1_op2_relation.
(integral_table::integral_table): Remove LE_EXPR.
(pointer_table::pointer_table): Remove LE_EXPR.
* range-op.h (le_op1_op2_relation): Delete.

Unify LT_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_lt): Remove. Move prototypes
to range-op-mixed.h
(operator_lt::fold_range): Rename from foperator_lt.
(operator_lt::op1_range): Ditto.
(float_table::float_table): Remove LT_EXPR.
* range-op-mixed.h (class operator_lt): Combined from integer
and float files.
* range-op.cc (op_lt): New object.
(unified_table::unified_table): Add LT_EXPR.
(class operator_lt): Move to range-op-mixed.h.
(lt_op1_op2_relation): Fold into
operator_lt::op1_op2_relation.
(integral_table::integral_table): Remove LT_EXPR.
(pointer_table::pointer_table): Remove LT_EXPR.
* range-op.h (lt_op1_op2_relation): Delete.

Unify NE_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_not_equal): Remove. Move prototypes
to range-op-mixed.h
(operator_equal::fold_range): Rename from foperator_not_equal.
(operator_equal::op1_range): Ditto.
(float_table::float_table): Remove NE_EXPR.
* range-op-mixed.h (class operator_not_equal): Combined from integer
and float files.
* range-op.cc (op_equal): New object.
(unified_table::unified_table): Add NE_EXPR.
(class operator_not_equal): Move to range-op-mixed.h.
(not_equal_op1_op2_relation): Fold into
operator_not_equal::op1_op2_relation.
(integral_table::integral_table): Remove NE_EXPR.
(pointer_table::pointer_table): Remove NE_EXPR.
* range-op.h (not_equal_op1_op2_relation): Delete.

Unify EQ_EXPR range operator

Move the declaration of the class to the range-op-mixed header, add the
floating point prototypes as well, and use it in the new unified table.

* range-op-float.cc (foperator_equal): Remove. Move prototypes
to range-op-mixed.h
(operator_equal::fold_range): Rename from foperator_equal.
(operator_equal::op1_range): Ditto.
(float_table::float_table): Remove EQ_EXPR.
* range-op-mixed.h (class operator_equal): Combined from integer
and float files.
* range-op.cc (op_equal): New object.
(unified_table::unified_table): Add EQ_EXPR.
(class operator_equal): Move to range-op-mixed.h.
(equal_op1_op2_relation): Fold into
operator_equal::op1_op2_relation.
(integral_table::integral_table): Remove EQ_EXPR.
(pointer_table::pointer_table): Remove EQ_EXPR.
* range-op.h (equal_op1_op2_relation): Delete.

Provide a unified range-op table.

Create a table to prepare for unifying all operations into a single table.
Move any operators which only occur in one table to the approriate
initialization routine.
Provide a mixed header file for range-ops with multiple categories.

* range-op-float.cc (class float_table): Move to header.
(float_table::float_table): Move float only operators to...
(range_op_table::initialize_float_ops): Here.
* range-op-mixed.h: New.
* range-op.cc (integral_tree_table, pointer_tree_table): Moved
to top of file.
(float_tree_table): Moved from range-op-float.cc.
(unified_tree_table): New.
(unified_table::unified_table): New. Call initialize routines.
(get_op_handler): Check unified table first.
(range_op_handler::range_op_handler): Handle no type constructor.
(integral_table::integral_table): Move integral only operators to...
(range_op_table::initialize_integral_ops): Here.
(pointer_table::pointer_table): Move pointer only operators to...
(range_op_table::initialize_pointer_ops): Here.
* range-op.h (enum bool_range_state): Move to range-op-mixed.h.
(get_bool_state): Ditto.
(empty_range_varying): Ditto.
(relop_early_resolve): Ditto.
(class range_op_table): Add new init methods for range types.
(class integral_table): Move declaration to here.
(class pointer_table): Move declaration to here.
(class float_table): Move declaration to here.

Daily bump.

VECT: Add SELECT_VL support

This patch address comments from Richard && Richi and rebase to trunk.

This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.

This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750

The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:

1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration

Code
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
   # { for (size_t i=0; i<n; i++) { z[i]=x[i]+y[i]; } }

Take RVV codegen for example:

Before this patch:
vvaddint32:
        ble     a0,zero,.L6
        csrr    a4,vlenb
        srli    a6,a4,2
.L4:
        mv      a5,a0
        bleu    a0,a6,.L3
        mv      a5,a6
.L3:
        vsetvli zero,a5,e32,m1,ta,ma
        vle32.v v2,0(a1)
        vle32.v v1,0(a2)
        vsetvli a7,zero,e32,m1,ta,ma
        sub     a0,a0,a5
        vadd.vv v1,v1,v2
        vsetvli zero,a5,e32,m1,ta,ma
        vse32.v v1,0(a3)
        add     a2,a2,a4
        add     a3,a3,a4
        add     a1,a1,a4
        bne     a0,zero,.L4
.L6:
        ret

After this patch:

vvaddint32:
    vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
    vle32.v v0, (a1)         # Get first vector
      sub a0, a0, t0         # Decrement number done
      slli t0, t0, 2         # Multiply number done by 4 bytes
      add a1, a1, t0         # Bump pointer
    vle32.v v1, (a2)         # Get second vector
      add a2, a2, t0         # Bump pointer
    vadd.vv v2, v0, v1       # Sum vectors
    vse32.v v2, (a3)         # Store result
      add a3, a3, t0         # Bump pointer
      bnez a0, vvaddint32    # Loop back
      ret                    # Finished

Co-authored-by: Richard Sandiford<richard.sandiford@arm.com>
Co-authored-by: Richard Biener <rguenther@suse.de>
gcc/ChangeLog:

* doc/md.texi: Add SELECT_VL support.
* internal-fn.def (SELECT_VL): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Ditto.
* tree-vect-stmts.cc (get_select_vl_data_ref_ptr): Ditto.
(vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): Ditto.

analyzer: add caching to globals with initializers [PR110112]

PR analyzer/110112 notes that -fanalyzer is extremely slow on a source
file with large read-only static arrays, repeatedly building the
same compound_svalue representing the full initializer, and repeatedly
building svalues representing parts of the the full initialiazer.

This patch adds caches for both of these; together they reduce the time
taken by -fanalyzer -O2 on the testcase in the bug for an optimized
build:
  91.2s : no caches (status quo)
  32.4s : cache in decl_region::get_svalue_for_constructor
   3.7s : cache in region::get_initial_value_at_main
   3.1s : both caches (this patch)

gcc/analyzer/ChangeLog:
PR analyzer/110112
* region-model.cc (region_model::get_initial_value_for_global):
Move code to region::calc_initial_value_at_main.
* region.cc (region::get_initial_value_at_main): New function.
(region::calc_initial_value_at_main): New function, based on code
in region_model::get_initial_value_for_global.
(region::region): Initialize m_cached_init_sval_at_main.
(decl_region::get_svalue_for_constructor): Add a cache, splitting
out body to...
(decl_region::calc_svalue_for_constructor): ...this new function.
* region.h (region::get_initial_value_at_main): New decl.
(region::calc_initial_value_at_main): New decl.
(region::m_cached_init_sval_at_main): New field.
(decl_region::decl_region): Initialize m_ctor_svalue.
(decl_region::calc_svalue_for_constructor): New decl.
(decl_region::m_ctor_svalue): New field.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: use using instead of typedef for type_traits

Since the type_traits header is a C++11 header file, using can be used instead
of typedef. This patch provides more readability, especially for long type
names.

libstdc++-v3/ChangeLog:

* include/std/type_traits: Use using instead of typedef

Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

Also check type being cast to

before casting into an irange, make sure the type being cast into
is also supported.

PR ipa/109886
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Check param
type as well.

Relocate range_cast to header, and add a generic version.

Make range_cast inlinable by moving it to the header file.
Also trap if the destination is not capable of representing the cast type.
Add a generic version which can change range classes.. ie float to int.

* range-op.cc (range_cast): Move to...
* range-op.h (range_cast): Here and add generic a version.

c++: fix 32-bit spaceship failures [PR110185]

Various spaceship tests failed after r14-1624. This turned out to be
because the comparison category classes return in memory on 32-bit targets,
and the synthesized operator<=> looks something like

if (auto v = a.x <=> b.x, v == 0); else return v;
if (auto v = a.y <=> b.y, v == 0); else return v;
etc.

so check_return_expr was trying to do NRVO for all the 'v' variables, and
now on subsequent returns we check to see if the previous NRV is still in
scope. But the NRVs didn't have names, so looking up name bindings crashed.
Fixed both by giving 'v' a name so we can NRVO the first one, and fixing the
test to give up if the old NRV has no name.

PR c++/110185
PR c++/58487

gcc/cp/ChangeLog:

* method.cc (build_comparison_op): Give retval a name.
* typeck.cc (check_return_expr): Fix for nameless variables.

c++: diagnose auto in template arg

We were failing to diagnose this Concepts TS feature that didn't make it
into C++20 because the 'auto' was getting converted to a template parameter
before we checked for it. So also check in cp_parser_simple_type_specifier.

The code in cp_parser_template_type_arg that I initially expected to
diagnose this seems unreachable because cp_parser_type_id_1 already checks
auto.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Check for auto
in template argument.
(cp_parser_template_type_arg): Remove auto checking.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/auto7.C: New test.
* g++.dg/concepts/auto7a.C: New test.

c++: init-list of uncopyable type [PR110102]

The maybe_init_list_as_range optimization is a form of copy elision, but we
can only elide well-formed copies.

PR c++/110102

gcc/cp/ChangeLog:

* call.cc (maybe_init_list_as_array): Check that the element type is
copyable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-opt1.C: New test.

doc: Clarification for -Wmissing-field-initializers

The manual is incorrect in saying that the option does not warn
about designated initializers, which it does in C++. Whether the
divergence in behavior is desirable is another thing, but let's
at least make the manual match the reality.

PR c/39589
PR c++/96868

gcc/ChangeLog:

* doc/invoke.texi: Clarify that -Wmissing-field-initializers doesn't
warn about designated initializers in C only.

Add Plus to the op list of `(zero_one == 0) ? y : z <op> y` pattern

This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/97711
PR tree-optimization/110155

gcc/ChangeLog:

* match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op.
((zero_one != 0) ? z <op> y : y): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.

Change the `(zero_one ==/!= 0) ? y : z <op> y` patterns to use multiply rather than `(-zero_one) & z`

Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.

gcc/ChangeLog:

* match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use
multiply rather than negation/bit_and.

MATCH: Allow unsigned types for `X & -Y -> X * Y` pattern

This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.

branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a)&c in there.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.

MATCH: Fix zero_one_valued_p not to match signed 1 bit integers

So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note the GCC 13 patch will be slightly different due to the changes
made to zero_one_valued_p.

PR tree-optimization/110165
PR tree-optimization/110166

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.

testsuite: fix the condition bug in tsvc s176

This patch fixes the problem that the loop in the tsvc s176 function is
optimized and removed because `iterations/LEN_1D` is 0 (where iterations
is set to 10000, LEN_1D is set to 32000 in tsvc.h).

This testcase passed on x86 and AArch64 system.

Best,
Lehua

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s176.c: Adjust iterations.
* gcc.dg/vect/tsvc/tsvc.h: Adjust expected rsult for s176.

libstdc++: Remove duplicate definition of _Float128 std::from_chars [PR110077]

When long double uses IEEE binary128 representation we define the
_Float128 overload of std::from_chars inline in <charconv>. My changes
in r14-1431-g7037e7b6e4ac41 cause it to also be defined non-inline in
the library, leading to an abi-check failure for (at least) sparc and
aarch64.

Suppress the definition in the library if long double and _Float128 have
are both IEEE binary128.

libstdc++-v3/ChangeLog:

PR libstdc++/110077
* src/c++17/floating_from_chars.cc (from_chars) <_Float128>:
Only define if _Float128 and long double have different
representations.

libstdc++: Add preprocessor checks to <experimental/internet> [PR100285]

We can't define endpoints and resolvers without the relevant OS support.
If IPPROTO_TCP and IPPROTO_UDP are both udnefined then we won't need
basic_endpoint and basic_resovler anyway, so make them depend on those
macros.

libstdc++-v3/ChangeLog:

PR libstdc++/100285
* include/experimental/internet [IPPROTO_TCP || IPPROTO_UDP]
(basic_endpoint, basic_resolver_entry, resolver_base)
(basic_resolver_results, basic_resolver): Only define if the tcp
or udp protocols will be defined.

libstdc++: Bump library version to libstdc++.so.6.0.33

The addition of __cxa_call_terminate@@CXXABI_1.3.15 on trunk means we
need a new version.

libstdc++-v3/ChangeLog:

* acinclude.m4 (libtool_VERSION): Update to 6.0.33.
* configure: Regenerate.
* doc/xml/manual/abi.xml: Add libstdc++.so.6.0.33.
* doc/html/manual/abi.html: Regenerate.

libstdc++: Fix P2510R3 "Formatting pointers" [PR110149]

I had intended to support the P2510R3 proposal unconditionally in C++20
mode, but I left it half implemented. The parse function supported the
new extensions, but the format function didn't.

This adds the missing pieces, and makes it only enabled for C++26 and
non-strict modes.

libstdc++-v3/ChangeLog:

PR libstdc++/110149
* include/std/format (formatter<const void*, charT>::parse):
Only alow 0 and P for C++26 and non-strict modes.
(formatter<const void*, charT>::format): Use toupper for P
type, and insert zero-fill characters for 0 option.
* testsuite/std/format/functions/format.cc: Check pointer
formatting. Only check P2510R3 extensions conditionally.
* testsuite/std/format/parse_ctx.cc: Only check P2510R3
extensions conditionally.

libstdc++: Optimize std::to_array for trivial types [PR110167]

As reported in PR libstdc++/110167, std::to_array compiles extremely
slowly for very large arrays. It needs to instantiate a very large
specialization of std::index_sequence<N...> and then create a very large
aggregate initializer from the pack expansion. For trivial types we can
simply default-initialize the std::array and then use memcpy to copy the
values. For non-trivial types we need to use the existing
implementation, despite the compilation cost.

As also noted in the PR, using a generic lambda instead of the
__to_array helper compiles faster since gcc-13. It also produces
slightly smaller code at -O1, due to additional inlining. The code at
-Os, -O2 and -O3 seems to be the same. This new implementation requires
__cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).

libstdc++-v3/ChangeLog:

PR libstdc++/110167
* include/std/array (to_array): Initialize arrays of trivial
types using memcpy. For non-trivial types, use lambda
expressions instead of a separate helper function.
(__to_array): Remove.
* testsuite/23_containers/array/creation/110167.cc: New test.

middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code

When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.

* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.

libstdc++: Improve tests for emplace member of sequence containers

Our existing tests for std::deque::emplace, std::list::emplace and
std::vector::emplace are poor. We only have compile tests for PR 52799
and the equivalent for a const_iterator as the insertion point. This
fails to check that the value is actually inserted correctly and the
right iterator is returned.

Add new tests that cover the existing 52799.cc and const_iterator.cc
compile-only tests, as well as verifying the effects are correct.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/list/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/52799.cc:
Removed.
* testsuite/23_containers/vector/modifiers/emplace/const_iterator.cc:
Removed.
* testsuite/23_containers/deque/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/list/modifiers/emplace/1.cc: New
test.
* testsuite/23_containers/vector/modifiers/emplace/1.cc: New
test.

RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. By default, the ZVFHMIN will enable FP16 for all the
iterators of RVV. And then the ZVFH will leverage one define attr as
the gate for FP16 supported or not.

Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Co-Authored by: Kito Cheng <kito.cheng@sifive.com>

gcc/ChangeLog:

* config/riscv/riscv.md (enabled): Move to another place, and
add fp_vector_disabled to the cond.
(fp_vector_disabled): New attr defined for disabling fp.
* config/riscv/vector-iterators.md: Fix V_WHOLE and V_FRACT.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add vle16 test
for ZVFHMIN.

RISC-V: Fix one warning of frm enum.

This patch would like to fix one warning similar as below, and add the
link for where the values comes from.

./gcc/config/riscv/riscv-protos.h:260:13: warning: binary constants are
a C++14 feature or GCC extension
FRM_RNE = 0b000,
^~~~~

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum frm_field_enum): Adjust
literal to int.

fortran: Fix ICE on pr96024.f90 on big-endian hosts [PR96024]

The pr96024.f90 testcase ICEs on big-endian hosts. The problem is
that length->val.integer is accessed after checking
length->expr_type == EXPR_CONSTANT, but it is a CHARACTER constant
which uses length->val.character union member instead and on big-endian
we end up reading constant 0x100000000 rather than some small number
on little-endian and if target doesn't have enough memory for 4 times
that (i.e. 16GB allocation), it ICEs.

2023-06-09 Jakub Jelinek <jakub@redhat.com>

PR fortran/96024
* primary.cc (gfc_convert_to_structure_constructor): Only do
constant string ctor length verification and truncation/padding
if constant length has INTEGER type.

Explicitly view_convert_expr mask to signed type when folding pblendvb builtins.

Since mask < 0 will be always false for vector char when
-funsigned-char, but vpblendvb needs to check the most significant
bit. The patch explicitly VCE to vector signed char.

gcc/ChangeLog:

PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Explicitly
view_convert_expr mask to signed type when folding pblendvb
builtins.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110108-2.c: New test.

Fold _mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple ABSU_EXPR + VCE.

r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
TYPE_MIN, but PABSB will store unsigned result into dst. The patch
uses ABSU_EXPR + VCE instead of ABS_EXPR.

Also don't fold _mm_abs_{pi8,pi16,pi32} w/o TARGET_64BIT since 64-bit
vector absm2 is guarded with TARGET_MMX_WITH_SSE.

gcc/ChangeLog:

PR target/110108
* config/i386/i386.cc (ix86_gimple_fold_builtin): Fold
_mm{,256,512}_abs_{epi8,epi16,epi32,epi64} into gimple
ABSU_EXPR + VCE, don't fold _mm_abs_{pi8,pi16,pi32} w/o
TARGET_64BIT.
* config/i386/i386-builtin.def: Replace CODE_FOR_nothing with
real codename for __builtin_ia32_pabs{b,w,d}.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110108.c: New test.
* gcc.target/i386/pr110108-3.c: New test.
* gcc.target/i386/pr109900.c: Adjust testcase.

Daily bump.

PR modula2/110126 variables are reported as unused when referenced by ASM

This patches fixes two problems with the asm statement.
gm2 -Wall -c fooasm3.mod generates an incorrect warning and
gm2 cannot concatenate strings before an ASM statement.
The asm statement now accepts a constant expression (rather than
a string) and it updates the variable read/write use lists as
appropriate.

gcc/m2/ChangeLog:

PR modula2/110126
* gm2-compiler/M2GenGCC.mod (BuildTreeFromInterface): Remove
tokenno parameter. Use object tok instead of tokenno.
(BuildTrashTreeFromInterface): Use object tok instead of
GetDeclaredMod.
(CodeInline): Remove tokenno from parameter list to BuildTreeFromInterface.
* gm2-compiler/M2Quads.def (BuildAsmElement): Exported and
defined.
* gm2-compiler/M2Quads.mod (BuildOptimizeOff): Reformatted.
(BuildInline): Reformatted.
(BuildLineNo): Reformatted.
(UseLineNote): Reformatted.
(BuildAsmElement): New procedure.
* gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P1Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P2Build.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/P3Build.bnf (AsmOperands): Rewrite.
(AsmOperandSpec): Rewrite.
(AsmOutputList): New rule.
(AsmInputList): New rule.
(TrashList): Rewrite.
* gm2-compiler/PCBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/PHBuild.bnf (AsmOperands): Use
ConstExpression instead of string.
(AsmElement): Use ConstExpression instead of string.
(TrashList): Use ConstExpression instead of string.
* gm2-compiler/SymbolTable.def (PutRegInterface):
Rewrite interface.
(GetRegInterface): Rewrite interface.
* gm2-compiler/SymbolTable.mod (SetFirstUsed): New procedure.
(PutFirstUsed): New procedure.
(PutRegInterface): Rewrite.
(GetRegInterface): Rewrite.

gcc/testsuite/ChangeLog:

PR modula2/110126
* gm2/pim/pass/fooasm3.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Provide a new dispatch mechanism for range-ops.

Simplify range_op_handler to have a single range_operator pointer and
provide a more flexible dispatch mechanism for calls via generic vrange
classes. This is more extensible for adding new classes of range support.
Any unsupported dispatch patterns will simply return FALSE now rather
than generating compile time exceptions, aleviating the need to
constantly check for supoprted types.

* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Adjust.
(gimple_range_op_handler::maybe_builtin_call): Adjust.
* gimple-range-op.h (operand1, operand2): Use m_operator.
* range-op.cc (integral_table, pointer_table): Relocate.
(get_op_handler): Rename from get_handler and handle all types.
(range_op_handler::range_op_handler): Relocate.
(range_op_handler::set_op_handler): Relocate and adjust.
(range_op_handler::range_op_handler): Relocate.
(dispatch_trio): New.
(RO_III, RO_IFI, RO_IFF, RO_FFF, RO_FIF, RO_FII): New consts.
(range_op_handler::dispatch_kind): New.
(range_op_handler::fold_range): Relocate and Use new dispatch value.
(range_op_handler::op1_range): Ditto.
(range_op_handler::op2_range): Ditto.
(range_op_handler::lhs_op1_relation): Ditto.
(range_op_handler::lhs_op2_relation): Ditto.
(range_op_handler::op1_op2_relation): Ditto.
(range_op_handler::set_op_handler): Use m_operator member.
* range-op.h (range_op_handler::operator bool): Use m_operator.
(range_op_handler::dispatch_kind): New.
(range_op_handler::m_valid): Delete.
(range_op_handler::m_int): Delete
(range_op_handler::m_float): Delete
(range_op_handler::m_operator): New.
(range_op_table::operator[]): Relocate from .cc file.
(range_op_table::set): Ditto.
* value-range.h (class vrange): Make range_op_handler a friend.

Unify range_operators to one class.

Range_operator and range_operator_float are 2 different classes, making
generalized dispatch difficult.  The distinction between what is a float
operator and what is an integral operator also blurs when some methods
have multiple types.  ie, casts : INT = FLOAT and FLOAT = INT

This patch unifies all possible invocation patterns in one class, and
switches the float table to use the general range_op_table.

* gimple-range-op.cc (cfn_constant_float_p): Change base class.
(cfn_pass_through_arg1): Adjust using statemenmt.
(cfn_signbit): Change base class, adjust using statement.
(cfn_copysign): Ditto.
(cfn_sqrt): Ditto.
(cfn_sincos): Ditto.
* range-op-float.cc (fold_range): Change class to range_operator.
(rv_fold): Ditto.
(op1_range): Ditto
(op2_range): Ditto
(lhs_op1_relation): Ditto.
(lhs_op2_relation): Ditto.
(op1_op2_relation): Ditto.
(foperator_*): Ditto.
(class float_table): New.  Inherit from range_op_table.
(floating_tree_table) Change to range_op_table pointer.
(class floating_op_table): Delete.
* range-op.cc (operator_equal): Adjust using statement.
(operator_not_equal): Ditto.
(operator_lt, operator_le, operator_gt, operator_ge): Ditto.
(operator_minus, operator_cast): Ditto.
(operator_bitwise_and, pointer_plus_operator): Ditto.
(get_float_handle): Change return type.
* range-op.h (range_operator_float): Delete.  Relocate all methods
into class range_operator.
(range_op_handler::m_float): Change type to range_operator.
(floating_op_table): Delete.
(floating_tree_table): Change type.

Remove tree_code from range-operator.

Range_operator had a tree code added last release to facilitate
bitmask operations. This removes the tree_code and replaces it with a
virtual routine to peform the masking. Remove any duplicate instances
which are no longer needed.

* range-op.cc (range_operator::fold_range): Call virtual routine.
(range_operator::update_bitmask): New.
(operator_equal::update_bitmask): New.
(operator_not_equal::update_bitmask): New.
(operator_lt::update_bitmask): New.
(operator_le::update_bitmask): New.
(operator_gt::update_bitmask): New.
(operator_ge::update_bitmask): New.
(operator_ge::update_bitmask): New.
(operator_plus::update_bitmask): New.
(operator_minus::update_bitmask): New.
(operator_pointer_diff::update_bitmask): New.
(operator_min::update_bitmask): New.
(operator_max::update_bitmask): New.
(operator_mult::update_bitmask): New.
(operator_div:operator_div):New.
(operator_div::update_bitmask): New.
(operator_div::m_code): New member.
(operator_exact_divide::operator_exact_divide): New constructor.
(operator_lshift::update_bitmask): New.
(operator_rshift::update_bitmask): New.
(operator_bitwise_and::update_bitmask): New.
(operator_bitwise_or::update_bitmask): New.
(operator_bitwise_xor::update_bitmask): New.
(operator_trunc_mod::update_bitmask): New.
(op_ident, op_unknown, op_ptr_min_max): New.
(op_nop, op_convert): Delete.
(op_ssa, op_paren, op_obj_type): Delete.
(op_realpart, op_imagpart): Delete.
(op_ptr_min, op_ptr_max): Delete.
(pointer_plus_operator:update_bitmask): New.
(range_op_table::set): Do not use m_code.
(integral_table::integral_table): Adjust to single instances.
* range-op.h (range_operator::range_operator): Delete.
(range_operator::m_code): Delete.
(range_operator::update_bitmask): New.

Fix floating point bug in fold_range.

We currently do not have any floating point operators where operand 1 is
a different type than the LHS. When we eventually do there is a bug
in fold_range. If either operand is a known NAN, it returns a NAN
of the type of operand 1 instead of the result type.

* range-op-float.cc (range_operator_float::fold_range): Return
NAN of the result type.

RISC-V: Add more test cases for RVV FP16

This patch would like to add new test cases to make sure the
RVV FP16 works well as expected.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new cases.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: New test.

analyzer: Standalone OOB-warning [PR109437, PR109439]

This patch enhances -Wanalyzer-out-of-bounds that is no longer paired
with a -Wanalyzer-use-of-uninitialized-value on out-of-bounds-read.

This also fixes PR analyzer/109437.
Before there could always be at most one OOB-read warning per frame because
-Wanalyzer-use-of-uninitialized-value always terminates the analysis
path.

PR 109439

gcc/analyzer/ChangeLog:

* bounds-checking.cc (region_model::check_symbolic_bounds): Returns whether the BASE_REG
region access was OOB.
(region_model::check_region_bounds): Likewise.
* region-model.cc (region_model::get_store_value): Creates an
unknown svalue on OOB-read access to REG.
(region_model::check_region_access): Returns whether an unknown svalue needs be created.
(region_model::check_region_for_read): Passes check_region_access return value.
* region-model.h: Update prior function definitions.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/out-of-bounds-2.c: Cleaned test for uninitialized-value warning
* gcc.dg/analyzer/out-of-bounds-5.c: Likewise.
* gcc.dg/analyzer/pr101962.c: Likewise.
* gcc.dg/analyzer/realloc-5.c: Likewise.
* gcc.dg/analyzer/pr109439.c: New test.

optabs: Implement double-word ctz and ffs expansion

We have expand_doubleword_clz for a couple of years, where we emit
double-word CLZ as if (high_word == 0) return CLZ (low_word) + word_size;
else return CLZ (high_word);
We can do something similar for CTZ and FFS IMHO, just with the 2
words swapped.  So if (low_word == 0) return CTZ (high_word) + word_size;
else return CTZ (low_word); for CTZ and
if (low_word == 0) { return high_word ? FFS (high_word) + word_size : 0;
else return FFS (low_word);

The following patch implements that.

Note, on some targets which implement both word_mode ctz and ffs patterns,
it might be better to incrementally implement those double-word ffs expansion
patterns in md files, because we aren't able to optimize it correctly;
nothing can detect we have just made sure that argument is not 0 and so
don't need to bother with handling that case.  So, on ia32 just using
CTZ patterns would be better there, but I think we can even do better and
instead of doing the comparisons of the operands against 0 do the CTZ
expansion followed by testing of flags.

2023-06-08  Jakub Jelinek  <jakub@redhat.com>

* optabs.cc (expand_ffs): Add forward declaration.
(expand_doubleword_clz): Rename to ...
(expand_doubleword_clz_ctz_ffs): ... this.  Add UNOPTAB argument,
handle also doubleword CTZ and FFS in addition to CLZ.
(expand_unop): Adjust caller.  Also call it for doubleword
ctz_optab and ffs_optab.

* gcc.target/i386/ctzll-1.c: New test.
* gcc.target/i386/ffsll-1.c: New test.

i386: Fix endless recursion in ix86_expand_vector_init_general with MMX [PR110152]

I'm getting
+FAIL: gcc.target/i386/3dnow-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-1.c (test for excess errors)
+FAIL: gcc.target/i386/3dnow-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/3dnow-2.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-1.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-1.c (test for excess errors)
+FAIL: gcc.target/i386/mmx-2.c (internal compiler error: Segmentation fault signal terminated program cc1)
+FAIL: gcc.target/i386/mmx-2.c (test for excess errors)
regressions on i686-linux since r14-1166. The problem is when
ix86_expand_vector_init_general is called with mmx_ok = true and
mode = V4HImode, it newly recurses with mmx_ok = false and mode = V2SImode,
but as mmx_ok is false and !TARGET_SSE, we recurse again with the same
arguments (ok, fresh new tmp and vals) infinitely.
The following patch fixes that by passing mmx_ok to that recursive call.
For n_words == 4 it isn't needed, because we only care about mmx_ok for
V2SImode or V2SFmode and no other modes.

2023-06-08 Jakub Jelinek <jakub@redhat.com>

PR target/110152
* config/i386/i386-expand.cc (ix86_expand_vector_init_general): For
n_words == 2 recurse with mmx_ok as first argument rather than false.

Fortran: Fix some more blockers in associate meta-bug [PR87477]

2023-06-08 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/87477
PR fortran/99350
PR fortran/107821
PR fortran/109451
* decl.cc (char_len_param_value): Simplify a copy of the expr
and replace the original if there is no error.
* gfortran.h : Remove the redundant field 'rankguessed' from
'gfc_association_list'.
* resolve.cc (resolve_assoc_var): Remove refs to 'rankguessed'.
(resolve_variable): Associate names with constant or structure
constructor targets cannot have array refs.
* trans-array.cc (gfc_conv_expr_descriptor): Guard expression
character length backend decl before using it. Suppress the
assignment if lhs equals rhs.
* trans-io.cc (gfc_trans_transfer): Scalarize transfer of
associate variables pointing to a variable. Add comment.
* trans-stmt.cc (trans_associate_var): Remove requirement that
the character length be deferred before assigning the value
returned by gfc_conv_expr_descriptor. Also, guard the backend
decl before testing with VAR_P.

gcc/testsuite/
PR fortran/99350
* gfortran.dg/pr99350.f90 : New test.

PR fortran/107821
* gfortran.dg/associate_5.f03 : Changed error message.
* gfortran.dg/pr107821.f90 : New test.

PR fortran/109451
* gfortran.dg/associate_61.f90 : New test

[testsuite] bump some tsvc timeouts

Several tests are timing out when targeting x86-*-vxworks with qemu.

Bump their timeout factor.

for gcc/testsuite/ChangeLog

* gcc.dg/vect/tsvc/vect-tsvc-s116.c: Bump timeout factor.
* gcc.dg/vect/tsvc/vect-tsvc-s241.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s271.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s276.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-vdotr.c: Likewise.

Daily bump.

[Committed] Bug fix to new wi::bitreverse_large function.

Richard Sandiford was, of course, right to be warry of new code without
much test coverage.  Converting the nvptx backend to use the BITREVERSE
rtx infrastructure, has resulted in far more exhaustive testing and
revealed a subtle bug in the new wi::bitreverse implementation.  The
code needs to use HOST_WIDE_INT_1U (instead of 1) to avoid unintended
sign extension.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(with a minor tweak to use BITREVERSE), where it fixes regressions of
the 32-bit test vectors in gcc.target/nvptx/brev-2.c and the 64-bit
test vectors in gcc.target/nvptx/brevll-2.c.  Committed as obvious.

2023-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* wide-int.cc (wi::bitreverse_large): Use HOST_WIDE_INT_1U to
avoid sign extension/undefined behaviour when setting each bit.

Add support for stc and cmc instructions in i386.md

This patch is the latest revision of my patch to add support for the
STC (set carry flag) and CMC (complement carry flag) instructions to
the i386 backend, incorporating Uros' previous feedback.  The significant
changes are (i) the inclusion of CMC, (ii) the use of UNSPEC for pattern,
(iii) Use of a new X86_TUNE_SLOW_STC tuning flag to use alternate
implementations on pentium4 (which has a notoriously slow STC) when
not optimizing for size.

An example of the use of the stc instruction is:
unsigned int foo (unsigned int a, unsigned int b, unsigned int *c) {
  return __builtin_ia32_addcarryx_u32 (1, a, b, c);
}

which previously generated:
        movl    $1, %eax
        addb    $-1, %al
        adcl    %esi, %edi
        setc    %al
        movl    %edi, (%rdx)
        movzbl  %al, %eax
        ret

with this patch now generates:
        stc
        adcl    %esi, %edi
        setc    %al
        movl    %edi, (%rdx)
        movzbl  %al, %eax
        ret

An example of the use of the cmc instruction (where the carry from
a first adc is inverted/complemented as input to a second adc) is:
unsigned int bar (unsigned int a, unsigned int b,
                  unsigned int c, unsigned int d)
{
  unsigned int c1 = __builtin_ia32_addcarryx_u32 (1, a, b, &o1);
  return __builtin_ia32_addcarryx_u32 (c1 ^ 1, c, d, &o2);
}

which previously generated:
        movl    $1, %eax
        addb    $-1, %al
        adcl    %esi, %edi
        setnc   %al
        movl    %edi, o1(%rip)
        addb    $-1, %al
        adcl    %ecx, %edx
        setc    %al
        movl    %edx, o2(%rip)
        movzbl  %al, %eax
        ret

and now generates:
        stc
        adcl    %esi, %edi
        cmc
        movl    %edi, o1(%rip)
        adcl    %ecx, %edx
        setc    %al
        movl    %edx, o2(%rip)
        movzbl  %al, %eax
        ret

This version implements Uros' suggestions/refinements. (i) Avoid the
UNSPEC_CMC by using the canonical RTL idiom for *x86_cmc, (ii) Use
peephole2s to convert x86_stc and *x86_cmc into alternate forms on
TARGET_SLOW_STC CPUs (pentium4), when a suitable QImode register is
available, (iii) Prefer the addqi_cconly_overflow idiom (addb $-1,%al)
over negqi_ccc_1 (neg %al) for setting the carry from a QImode value,
These changes required two minor edits to i386.cc:  ix86_cc_mode had
to be tweaked to suggest CCCmode for the new *x86_cmc pattern, and
*x86_cmc needed to be handled/parameterized in ix86_rtx_costs so that
combine would appreciate that this complex RTL expression was actually
a fast, single byte instruction [i.e. preferable].

2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>
    Uros Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_builtin) <handlecarry>:
Use new x86_stc instruction when the carry flag must be set.
* config/i386/i386.cc (ix86_cc_mode): Use CCCmode for *x86_cmc.
(ix86_rtx_costs): Provide accurate rtx_costs for *x86_cmc.
* config/i386/i386.h (TARGET_SLOW_STC): New define.
* config/i386/i386.md (UNSPEC_STC): New UNSPEC for stc.
(x86_stc): New define_insn.
(define_peephole2): Convert x86_stc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*x86_cmc): New define_insn.
(define_peephole2): Convert *x86_cmc into alternate implementation
on pentium4 without -Os when a QImode register is available.
(*setccc): New define_insn_and_split for a no-op CCCmode move.
(*setcc_qi_negqi_ccc_1_<mode>): New define_insn_and_split to
recognize (and eliminate) the carry flag being copied to itself.
(*setcc_qi_negqi_ccc_2_<mode>): Likewise.
* config/i386/x86-tune.def (X86_TUNE_SLOW_STC): New tuning flag.

gcc/testsuite/ChangeLog
* gcc.target/i386/cmc-1.c: New test case.
* gcc.target/i386/stc-1.c: Likewise.

c++: allow NRV and non-NRV returns [PR58487]

Now that we support NRV from an inner block, we can also support non-NRV
returns from other blocks, since once the NRV is out of scope a later return
expression can't possibly alias it.

This fixes 58487 and half-fixes 53637: now one of the returns is elided, but
not the other.

Fixing the remaining xfails in these testcases will require a very different
approach, probably involving a full tree/block walk from finalize_nrv, and
check_return_expr only adding to a list of potential return variables.

PR c++/58487
PR c++/53637

gcc/cp/ChangeLog:

* cp-tree.h (INIT_EXPR_NRV_P): New.
* semantics.cc (finalize_nrv_r): Check it.
* name-lookup.h (decl_in_scope_p): Declare.
* name-lookup.cc (decl_in_scope_p): New.
* typeck.cc (check_return_expr): Allow non-NRV
returns if the NRV is no longer in scope.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv26.C: New test.
* g++.dg/opt/nrv26a.C: New test.
* g++.dg/opt/nrv27.C: New test.

MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z <op> y` patterns

The patterns match more than just `a & 1` so change the comment
for these two patterns to say that.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd: Fix comment for the
`(zero_one ==/!= 0) ? y : z <op> y` patterns.

RISC-V: Eliminate extension after for *w instructions

This patch tries to prevent generating unnecessary sign extension
after *w instructions like "addiw" or "divw".

The main idea of it is to add SUBREG_PROMOTED fields during expanding.

I have tested on SPEC2017 there is no regression.
Only gcc.dg/pr30957-1.c test failed.
To solve that I did some changes in loop-iv.cc, but not sure that it is
suitable.

gcc/ChangeLog:
* config/riscv/bitmanip.md (rotrdi3, rotrsi3, rotlsi3): New expanders.
(rotrsi3_sext): Expose generator.
(rotlsi3 pattern): Hide generator.
* config/riscv/riscv-protos.h (riscv_emit_binary): New function
declaration.
* config/riscv/riscv.cc (riscv_emit_binary): Removed static
* config/riscv/riscv.md (addsi3, subsi3, negsi2): Hide generator.
(mulsi3, <optab>si3): Likewise.
(addsi3, subsi3, negsi2, mulsi3, <optab>si3): New expanders.
(addv<mode>4, subv<mode>4, mulv<mode>4): Use riscv_emit_binary.
(<u>mulsidi3): Likewise.
(addsi3_extended, subsi3_extended, negsi2_extended): Expose generator.
(mulsi3_extended, <optab>si3_extended): Likewise.
(splitter for shadd feeding divison): Update RTL pattern to account
for changes in how 32 bit ops are expanded for TARGET_64BIT.
* loop-iv.cc (get_biv_step_1): Process src of extension when it PLUS.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/shift-and-2.c: New tests.
* gcc.target/riscv/shift-shift-2.c: Adjust expected output.
* gcc.target/riscv/sign-extend.c: New test.
* gcc.target/riscv/zbb-rol-ror-03.c: Adjust expected output.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

riscv: Fix scope for memory model calculation

During libgcc configure stage for riscv32-none-elf, when
"--enable-checking=yes,rtl" has been activated, the following error
is observed:

  during RTL pass: final
  conftest.c: In function 'main':
  conftest.c:16:1: internal compiler error: RTL check: expected code 'const_int', have 'reg' in riscv_print_operand, at config/riscv/riscv.cc:4462
     16 | }
        | ^
  0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
  0x8ea823 riscv_print_operand
          /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:4462
  0xde84b5 output_operand(rtx_def*, int)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3632
  0xde8ef8 output_asm_insn(char const*, rtx_def**)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3544
  0xded33b output_asm_insn(char const*, rtx_def**)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:3421
  0xded33b final_scan_insn_1
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2841
  0xded6cb final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:2887
  0xded8b7 final_1
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:1979
  0xdee518 rest_of_handle_final
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4240
  0xdee518 execute
          /mnt/nvme/dinux/local-workspace/gcc/gcc/final.cc:4318

Fix by moving the calculation of memmodel to the cases where it is used.

Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.

PR target/109725

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Calculate
memmodel only when it is valid.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

riscv: Fix insn cost calculation

When building riscv32-none-elf with "--enable-checking=yes,rtl", the
following ICE is observed:

  cc1: internal compiler error: RTL check: expected code 'const_int', have 'const_double' in riscv_const_insns, at config/riscv/riscv.cc:1313
  0x843c4d rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/rtl.cc:916
  0x8eab61 riscv_const_insns(rtx_def*)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:1313
  0x15443bb riscv_legitimate_constant_p
          /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv.cc:826
  0xdd3c71 emit_move_insn(rtx_def*, rtx_def*)
          /mnt/nvme/dinux/local-workspace/gcc/gcc/expr.cc:4310
  0x15f28e5 run_const_vector_selftests
          /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:285
  0x15f37bd selftest::riscv_run_selftests()
          /mnt/nvme/dinux/local-workspace/gcc/gcc/config/riscv/riscv-selftests.cc:364
  0x1f6fba9 selftest::run_tests()
          /mnt/nvme/dinux/local-workspace/gcc/gcc/selftest-run-tests.cc:111
  0x11d1f39 toplev::run_self_tests()
          /mnt/nvme/dinux/local-workspace/gcc/gcc/toplev.cc:2185

Fix by following the spirit of the adjacent comment, and using the
dedicated riscv_const_insns() function to calculate cost for loading a
constant element.  Infinite recursion is not possible because the first
invocation is on a CONST_VECTOR, whereas the second is on a single
element of the vector (e.g. CONST_INT or CONST_DOUBLE).

Regression tested for riscv32-none-elf. No changes in gcc.sum and
g++.sum.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Recursively call
for constant element of a vector.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>

libstdc++: Fix up 20_util/to_chars/double.cc test for excess precision [PR110145]

This test apparently contains 3 problematic floating point constants,
1e126, 4.91e-6 and 5.547e-6. These constants suffer from double rounding
when -fexcess-precision=standard evaluates double constants in the precision
of Intel extended 80-bit long double.
As written in the PR, e.g. the first one is
0x1.7a2ecc414a03f7ff6ca1cb527787b130a97d51e51202365p+418
in the precision of GCC's internal format, 80-bit long double has
63-bit precision, so the above constant rounded to long double is
0x1.7a2ecc414a03f800p+418L
(the least significant bit in the 0 before p isn't there already).
0x1.7a2ecc414a03f800p+418L rounded to IEEE double is
0x1.7a2ecc414a040p+418.
Now, if excess precision doesn't happen and we round the GCC's internal
format number directly to double, it is
0x1.7a2ecc414a03fp+418 and that is the number the test expects.
One can see it on x86-64 (where excess precision to long double doesn't
happen) where double(1e126L) != 1e126.
The other two constants suffer from the same problem.

The following patch tweaks the testcase, such that those problematic
constants are used only if FLT_EVAL_METHOD is 0 or 1 (i.e. when we have
guarantee the constants will be evaluated in double precision),
plus adds corresponding tests with hexadecimal constants which don't
suffer from this excess precision problem, they are exact in double
and long double can hold all double values.

2023-06-07 Jakub Jelinek <jakub@redhat.com>

PR libstdc++/110145
* testsuite/20_util/to_chars/double.cc: Include <cfloat>.
(double_to_chars_test_cases,
double_scientific_precision_to_chars_test_cases_2,
double_fixed_precision_to_chars_test_cases_2): #if out 1e126, 4.91e-6
and 5.547e-6 tests if FLT_EVAL_METHOD is negative or larger than 1.
Add unconditional tests with corresponding double constants
0x1.7a2ecc414a03fp+418, 0x1.4981285e98e79p-18 and
0x1.7440bbff418b9p-18.

match.pd: Improve zero_one_valued_p

Recently zero_one_valued_p was changed to handle integer_zerop
case specially, because tree_nonzero_bits (@0) == 1 only returns
true for non-constant values with range [0, 1] or constant 1,
constant 0 has tree_nonzero_bits (integer_zero_node) == 0.

The following patch reverts that change and instead checks
that tree_nonzero_bits is <= 1U.

2023-06-07 Jakub Jelinek <jakub@redhat.com>

* match.pd (zero_one_valued_p): Don't handle integer_zerop specially,
instead compare tree_nonzero_bits <= 1U rather than just == 1.

aarch64: Allow compiler to define ls64 builtins [PR110132]

This patch refactors the ls64 builtins to allow the compiler to define them
directly instead of having wrapper functions in arm_acle.h. This should be not
only easier to maintain, but it makes two important correctness fixes:
- It fixes PR110132, where the builtins ended up getting declared with
invisible bindings in the C FE, so the FE ended up synthesizing
incompatible implicit definitions for these builtins.
- It allows the builtins to be used with LTO, which didn't work previously.

We also take the opportunity to add test coverage from C++ for these
builtins.

gcc/ChangeLog:

PR target/110132
* config/aarch64/aarch64-builtins.cc (aarch64_general_simulate_builtin):
New. Use it ...
(aarch64_init_ls64_builtins): ... here. Switch to declaring public ACLE
names for builtins.
(aarch64_general_init_builtins): Ensure we invoke the arm_acle.h
setup if in_lto_p, just like we do for SVE.
* config/aarch64/arm_acle.h: (__arm_ld64b): Delete.
(__arm_st64b): Delete.
(__arm_st64bv): Delete.
(__arm_st64bv0): Delete.

gcc/testsuite/ChangeLog:

PR target/110132
* lib/target-supports.exp (check_effective_target_aarch64_asm_FUNC_ok):
Extend to ls64.
* g++.target/aarch64/acle/acle.exp: New.
* g++.target/aarch64/acle/ls64.C: New test.
* g++.target/aarch64/acle/ls64_lto.C: New test.
* gcc.target/aarch64/acle/ls64_lto.c: New test.
* gcc.target/aarch64/acle/pr110132.c: New test.

aarch64: Fix wrong code with st64b builtin [PR110100]

The st64b pattern incorrectly had an output constraint on the register
operand containing the destination address for the store, leading to
wrong code. This patch fixes that.

gcc/ChangeLog:

PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_expand_builtin_ls64):
Use input operand for the destination address.
* config/aarch64/aarch64.md (st64b): Fix constraint on address
operand.

gcc/testsuite/ChangeLog:

PR target/110100
* gcc.target/aarch64/acle/pr110100.c: New test.

aarch64: Fix whitespace in ls64 builtin implementation [PR110100]

The ls64 builtin code was using incorrect GNU style with eight spaces where
there should be a tab. Fixed thusly.

gcc/ChangeLog:

PR target/110100
* config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types):
Replace eight consecutive spaces with tabs.
(aarch64_init_ls64_builtins): Likewise.
(aarch64_expand_builtin_ls64): Likewise.
* config/aarch64/aarch64.md (ld64b): Likewise.
(st64b): Likewise.
(st64bv): Likewise
(st64bv0): Likewise.

libgcc: Fix eh_frame fast path in find_fde_tail

The eh_frame value is only used by linear_search_fdes, not the binary
search directly in find_fde_tail, so the bug is not immediately
apparent with most programs.

Fixes commit e724b0480bfa5ec04f39be8c7290330b495c59de ("libgcc:
Special-case BFD ld unwind table encodings in find_fde_tail").

libgcc/

PR libgcc/109712
* unwind-dw2-fde-dip.c (find_fde_tail): Correct fast path for
parsing eh_frame.

libstdc++: Restore accidentally removed version in abi-check

In r14-1583-g192665feef7129 I meant to add CXXABI_1.3.15 but instead I
replaced CXXABI_1.3.14 with it. This restores the CXXABI_1.3.14 version.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_abi.cc (check_version): Re-add
CXXABI_1.3.14.

libstdc++: Fix some tests that fail with -fno-exceptions

libstdc++-v3/ChangeLog:

* testsuite/18_support/nested_exception/rethrow_if_nested-term.cc:
Require effective target exceptions_enabled instead of using
dg-skip-if.
* testsuite/23_containers/vector/capacity/constexpr.cc: Expect
shrink_to_fit() to be a no-op without exceptions enabled.
* testsuite/23_containers/vector/capacity/shrink_to_fit.cc:
Likewise.
* testsuite/ext/bitmap_allocator/check_allocate_max_size.cc:
Require effective target exceptions_enabled.
* testsuite/ext/malloc_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/mt_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/new_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/pool_allocator/check_allocate_max_size.cc:
Likewise.
* testsuite/ext/throw_allocator/check_allocate_max_size.cc:
Likewise.

libstdc++: Fix some tests that fail with -fexcess-precision=standard

libstdc++-v3/ChangeLog:

* testsuite/20_util/duration/cons/2.cc: Use values that aren't
affected by rounding.
* testsuite/20_util/from_chars/5.cc: Cast arithmetic result to
double before comparing for equality.
* testsuite/20_util/from_chars/6.cc: Likewise.
* testsuite/20_util/variant/86874.cc: Use values that aren't
affected by rounding.
* testsuite/25_algorithms/lower_bound/partitioned.cc: Compare to
original value instead of to floating-point-literal.
* testsuite/26_numerics/random/discrete_distribution/cons/range.cc:
Cast arithmetic result to double before comparing for equality.
* testsuite/26_numerics/random/piecewise_constant_distribution/cons/range.cc:
Likewise.
* testsuite/26_numerics/random/piecewise_linear_distribution/cons/range.cc:
Likewise.
* testsuite/26_numerics/valarray/transcend.cc (eq): Check that
the absolute difference is less than 0.01 instead of comparing
to two decimal places.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/01.cc:
Cast arithmetic result to double before comparing for equality.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/09.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/10.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/01.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/09.cc:
Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/10.cc:
Likewise.
* testsuite/ext/random/hoyt_distribution/cons/parms.cc: Likewise.

RA: Constrain class of pic offset table pseudo to general regs

On some targets an integer pseudo can be assigned to a FP reg.  For
pic offset table pseudo it means we will reload the pseudo in this
case and, as a consequence, memory containing the pseudo might be
recognized as wrong one.  The patch fix this problem.

        PR target/109541

gcc/ChangeLog:

* ira-costs.cc: (find_costs_and_classes): Constrain classes of pic
offset table pseudo to a general reg subset.

gcc/testsuite/ChangeLog:

* gcc.target/sparc/pr109541.c: New.

aarch64: Represent SQXTUN with RTL operations

This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the operation.
SQXTUN is an odd one. It's described in the architecture as "Signed saturating extract Unsigned Narrow".
It's not a straightforward ss_truncate nor a us_truncate.
It is a sort of truncating signed clamp operation with limits derived from the unsigned extrema of the narrow mode:
(truncate:N
  (smin:M
    (smax:M (reg:M) (const_int 0))
    (const_int <unsigned-max-for-mode-N>)))

This patch implements these semantics. I've checked that the vqmovun tests in advsimd-intrinsics.exp
now get constant-folded and still pass validation, so I'm pretty confident in the semantics.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode><vczle><vczbe>):
Rename to...
(*aarch64_sqmovun<mode>_insn<vczle><vczbe>): ... This.  Reimplement
with RTL codes.
(aarch64_sqmovun<mode> [SD_HSDI]): Reimplement with RTL codes.
(aarch64_sqxtun2<mode>_le): Likewise.
(aarch64_sqxtun2<mode>_be): Likewise.
(aarch64_sqxtun2<mode>): Adjust for the above.
(aarch64_sqmovun<mode>): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete.
(half_mask): New mode attribute.
* config/aarch64/predicates.md (aarch64_simd_umax_half_mode):
New predicate.

aarch64: Improve RTL representation of ADDP instructions

Similar to the ADDLP instructions the non-widening ADDP ones can be
represented by adding the odd lanes with the even lanes of a vector.
These instructions take two vector inputs and the architecture spec
describes the operation as concatenating them together before going
through it with pairwise additions.
This patch chooses to represent ADDP on 64-bit and 128-bit input
vectors slightly differently, reasons explained in the comments
in aarhc64-simd.md.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_addp<mode><vczle><vczbe>):
Reimplement as...
(aarch64_addp<mode>_insn): ... This...
(aarch64_addp<mode><vczle><vczbe>_insn): ... And this.
(aarch64_addp<mode>): New define_expand.

Revert "libstdc++: Use AS_IF in configure.ac"

This reverts commit 97a5e8a2a48d162744a5bd60a012ce6fca13cbbe.

libstdc++-v3/ChangeLog:

* configure: Regenerate.
* configure.ac:

Fix expected test output on hppa

Recent changes in the hoisting code change the optimized gimple for the
shadd-3 testcase on the PA. That in turn changes the number of expected
shadd instructions.

I'm not entirely sure the test is actually testing what we want anymore
since I don't see a CSE for postreload to discover. But I did verify
that the number of shadd instructions is sane, so I just changed the
count in the obvious way.

gcc/testsuite
* gcc.target/hppa/shadd-3.c: Update expected output.

testsuite/libgomp.*/target-present-*.{c,f90}: Improve and fix

One of the testcases lacked variables in a map clause such that
the fail occurred too early. Additionally, it would have failed
for all those non-host devices where 'present' is always true, i.e.
non-host devices which can access all of the host memory
(shared-memory devices). [There are currently none.]

The commit now runs the code on all devices, which should succeed
for host fallback and for shared-memory devices, finding potenial issues
that way. Additionally, a checkpoint (required stdout output) is used
to ensure that the execution won't fail (with the same error) before
reaching the expected fail location.

2023-06-07 Thomas Schwinge <thomas@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>

libgomp/
* testsuite/libgomp.c-c++-common/target-present-1.c: Run code
also for non-offload_device targets; check that it runs
successfully for those and for all until a checkpoint for all
* testsuite/libgomp.c-c++-common/target-present-2.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise;
add missing vars to map clause.

Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++ testing

Verbatim copy of what was added to 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune'
in Subversion r279246 (Git commit a9046e9853024206bec092dd63e21e152cb5cbca)
"[MSP430] -Add fno-exceptions multilib".

This greatly improves 'make check-target-libstdc++-v3' results for, for
example, x86_64-pc-linux-gnu with:

RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'

libstdc++-v3/
* testsuite/lib/prune.exp (libstdc++-dg-prune): Support
'UNSUPPORTED: [...]: exception handling disabled'.

modula2: Fix bootstrap

internal-fn.h since yesterday includes insn-opinit.h, which is a generated
header.
One of my bootstraps today failed because some m2 sources started compiling
before insn-opinit.h has been generated.

Normally, gcc/Makefile.in has
  # In order for parallel make to really start compiling the expensive
  # objects from $(OBJS) as early as possible, build all their
  # prerequisites strictly before all objects.
  $(ALL_HOST_OBJS) : | $(generated_files)

rule which ensures that all the generated files are generated before
any $(ALL_HOST_OBJS) objects start, but use order-only dependency for
this because we don't want to rebuild most of the objects whenever one
generated header is regenerated.  After the initial build in an empty
directory we'll have .deps/ files contain the detailed dependencies.

$(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case
would be m2_OBJS, but m2/Make-lang.in doesn't define those.

The following patch just adds a similar rule to m2/Make-lang.in.
Another option would be to set m2_OBJS variable in m2/Make-lang.in to
something, but not really sure to which exactly and why it isn't
done.

2023-06-07  Jakub Jelinek  <jakub@redhat.com>

* Make-lang.in: Build $(generated_files) before building
all $(GM2_C_OBJS).

RISC-V: Support RVV VLA SLP auto-vectorization

This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
    {
      a[i * 8 + 0] = b[i * 8 + 7] + 1;
      a[i * 8 + 1] = b[i * 8 + 7] + 2;
      a[i * 8 + 2] = b[i * 8 + 7] + 8;
      a[i * 8 + 3] = b[i * 8 + 7] + 4;
      a[i * 8 + 4] = b[i * 8 + 7] + 5;
      a[i * 8 + 5] = b[i * 8 + 7] + 6;
      a[i * 8 + 6] = b[i * 8 + 7] + 7;
      a[i * 8 + 7] = b[i * 8 + 7] + 3;
    }
}

To enable VLA SLP auto-vectorization, we should be able to handle this following const vector:

1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }

2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }

And these vector can be generated at prologue.

After this patch, we end up with this following codegen:

Prologue:
...
        vsetvli a7,zero,e16,m2,ta,ma
        vid.v   v4
        vsrl.vi v4,v4,3
        li      a3,8
        vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
        li      t1,67633152
        addi    t1,t1,513
        li      a3,50790400
        addi    a3,a3,1541
        slli    a3,a3,32
        add     a3,a3,t1
        vsetvli t1,zero,e64,m1,ta,ma
        vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
        min     a3,...
        vsetvli zero,a3,e8,m1,ta,ma
        vle8.v  v2,0(a6)
        vsetvli a7,zero,e8,m1,ta,ma
        vrgatherei16.vv v1,v2,v4
        vadd.vv v1,v1,v3
        vsetvli zero,a3,e8,m1,ta,ma
        vse8.v  v1,0(a2)
        add     a6,a6,a4
        add     a2,a2,a4
        mv      a3,a5
        add     a5,a5,t1
        bgtu    a3,a4,.L3
...

Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8
since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which
only can maximum element index = 255).

Epilogue:
        lbu     a5,799(a1)
        addiw   a4,a5,1
        sb      a4,792(a0)
        addiw   a4,a5,2
        sb      a4,793(a0)
        addiw   a4,a5,8
        sb      a4,794(a0)
        addiw   a4,a5,4
        sb      a4,795(a0)
        addiw   a4,a5,5
        sb      a4,796(a0)
        addiw   a4,a5,6
        sb      a4,797(a0)
        addiw   a4,a5,7
        sb      a4,798(a0)
        addiw   a5,a5,3
        sb      a5,799(a0)
        ret

There is one more last thing we need to do is the "Epilogue auto-vectorization"
which needs VLS modes support. I will support VLS modes for
"Epilogue auto-vectorization" in the future.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY
handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.

Handle const_int in expand_single_bit_test

After expanding directly to rtl instead of
creating a tree, we could end up with
a const_int which is not ready to be handled
by extract_bit_field.
So need to the constant folding here instead.

OK? bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR middle-end/110117

gcc/ChangeLog:

* expr.cc (expand_single_bit_test): Handle
const_int from expand_expr.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110117-1.c: New test.
* gcc.dg/pr110117-2.c: New test.

Improve do_store_flag for single bit when there is no non-zero bits

In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or
turn off the bit tracking part of CCP so we would lose out
what TER was able to do before hand. This moves around the
TER code so that it is used instead of just the nonzerobits.
It also makes it easier to remove the TER part of the code
later on too.

OK? Bootstrapped and tested on x86_64-linux-gnu.

Note it reintroduces PR 110117 (which was accidently fixed after
r14-1534-g908e5ab5c11c). The next patch in series will fix that.

gcc/ChangeLog:

* expr.cc (do_store_flag): Rearrange the
TER code so that it overrides the nonzero bits
info if we had `a & POW2`.

For the `-A CMP -B -> B CMP A` pattern allow EQ/NE for all integer types

I noticed while looking at some code generation issue, that forwprop
was not handling `-a == 0` for unsigned types and I was confused why
it was not.
r6-1814-g66e1cacf608045 removed these from fold because they
were supposed to be already handled by the match.pd patterns
but it was missed that the match.pd patterns checked
TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ.
This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/110134
* match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer
types.
(-A CMP CST -> B CMP (-CST)): Likewise.

gcc/testsuite/ChangeLog:

PR tree-optimization/110134
* gcc.dg/tree-ssa/negneq-1.c: New test.
* gcc.dg/tree-ssa/negneq-2.c: New test.
* gcc.dg/tree-ssa/negneq-3.c: New test.
* gcc.dg/tree-ssa/negneq-4.c: New test.

libiberty: writeargv: Simplify function error mode.

You are right, this is also a remnant of the old function design
that I completely missed.    Here is the follow-up patch for that.

Thanks for pointing it out.

Costas

On Tue, 6 Jun 2023 at 04:12, Jeff Law <jeffreyalaw@gmail.com> wrote:

    On 6/5/23 08:37, Costas Argyris via Gcc-patches wrote:
    > writeargv can be simplified by getting rid of the error exit mode
    > that was only relevant many years ago when the function used
    > to open the file descriptor internally.
    [ ... ]
    Thanks.  I've pushed this to the trunk.

    You could (as a follow-up) simplify it even further.  There's no need
    for the status variable as far as I can tell.  You could just have the
    final return be "return 0;" instead of "return status;".

libiberty/
* argv.c (writeargv): Constant propagate "0" for "status",
simplifying the code slightly.

Add match patterns for `a ? onezero : onezero` where one of the two operands are constant

This adds a match pattern that are for boolean values
that optimizes `a ? onezero : 0` to `a & onezero` and
`a ? 1 : onezero` to `a | onezero`.

This was reported a few times and I thought I would finally
add the match pattern for this.

This hits a few times in GCC itself too.

Notes on the testcases:
* phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine
* phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment.
* ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt.
Note PR 109957 is filed for the now missing optimization in that testcase too.

gcc/ChangeLog:

PR tree-optimization/89263
PR tree-optimization/99069
PR tree-optimization/20083
PR tree-optimization/94898
* match.pd: Add patterns to optimize `a ? onezero : onezero` with
one of the operands are constant.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase.
* gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase.
* gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt.
* gcc.dg/tree-ssa/phi-opt-27.c: New test.
* gcc.dg/tree-ssa/phi-opt-28.c: New test.
* gcc.dg/tree-ssa/phi-opt-29.c: New test.
* gcc.dg/tree-ssa/phi-opt-30.c: New test.
* gcc.dg/tree-ssa/phi-opt-31.c: New test.
* gcc.dg/tree-ssa/phi-opt-32.c: New test.

Match: zero_one_valued_p should match 0 constants too

While working on `bool0 ? bool1 : bool2` I noticed that
zero_one_valued_p does not match on the constant zero
as in that case tree_nonzero_bits will return 0 and
that is different from 1.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (zero_one_valued_p): Match 0 integer constant
too.

RISC-V: Fix ICE when include riscv_vector.h with rv64gcv

This patch would like to fix the incorrect requirement of the vector
builtin types for the ZVFH/ZVFHMIN extension. The incorrect requirement
will result in the ops mismatch with iterators, and then ICE will be
triggered if ZVFH/ZVFHMIN is not given.

Sorry for inconviensient.

Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): Take RVV_REQUIRE_ELEN_FP_16 as requirement.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): Ditto.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.

c++: Add -Wnrvo

While looking at PRs about cases where we don't perform the named return
value optimization, it occurred to me that it might be useful to have a
warning for that.

This does not fix PR58487, but might be interesting to people watching it.

PR c++/58487

gcc/c-family/ChangeLog:

* c.opt: Add -Wnrvo.

gcc/ChangeLog:

* doc/invoke.texi: Document it.

gcc/cp/ChangeLog:

* typeck.cc (want_nrvo_p): New.
(check_return_expr): Handle -Wnrvo.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv25.C: New test.

c++: enable NRVO from inner block [PR51571]

Our implementation of the named return value optimization has been limited
to variables declared in the outermost block of the function, to avoid
needing to handle the case where the variable needs to be destroyed due to
going out of scope. PR92407 pointed out a case we were missing, where the
variable goes out of scope due to a goto and we were failing to destroy it.

It occurred to me that this problem is the flip side of PR33799, where we
need to be sure to destroy the return value if a cleanup throws on return;
here we want to avoid destroying the return value when exiting the
variable's scope on return. We can use the same flag to indicate to both
cleanups that we're returning.

This implements the guaranteed copy elision specified by P2025 (which is not
yet part of the draft standard).

PR c++/51571
PR c++/92407

gcc/cp/ChangeLog:

* decl.cc (finish_function): Simplify NRV handling.
* except.cc (maybe_set_retval_sentinel): Also set if NRV.
(maybe_splice_retval_cleanup): Don't add the cleanup region
if we don't need it.
* semantics.cc (nrv_data): Add simple field.
(finalize_nrv): Set it.
(finalize_nrv_r): Check it and retval sentinel.
* cp-tree.h (finalize_nrv): Adjust declaration.
* typeck.cc (check_return_expr): Remove named_labels check.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv23.C: New test.