The following adjusts mask recording which didn't take into account
that we can merge call arguments from two vectors like
_50 = {vect_d_1.253_41, vect_d_1.254_43};
_51 = VIEW_CONVERT_EXPR<unsigned char>(mask__19.257_49);
_52 = (unsigned int) _51;
_53 = _Z3bazd.simdclone.7 (_50, _52);
_54 = BIT_FIELD_REF <_53, 256, 0>;
_55 = BIT_FIELD_REF <_53, 256, 256>;
The testcase g++.dg/vect/pr68762-2.cc exercises this on x86_64 with
partial vector usage enabled and AVX512 support.
PR tree-optimization/115868
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Correctly
compute the number of mask copies required for vect_record_loop_mask.
case SIMD_CLONE_ARG_TYPE_MASK:
if (loop_vinfo
&& LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
- vect_record_loop_mask (loop_vinfo,
- &LOOP_VINFO_MASKS (loop_vinfo),
- ncopies, vectype, op);
+ {
+ unsigned nmasks
+ = exact_div (ncopies * bestn->simdclone->simdlen,
+ TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
+ vect_record_loop_mask (loop_vinfo,
+ &LOOP_VINFO_MASKS (loop_vinfo),
+ nmasks, vectype, op);
+ }
break;
}