Also handle avx512 kmask & immediate 15 or 3 when VF is 4/2.
like
r16-105-g599bca27dc37b3, the patch handles redunduant clean up of
upper-bits for maskload.
.i.e
Successfully matched this instruction:
(set (reg:V4DF 175)
(vec_merge:V4DF (unspec:V4DF [
(mem:V4DF (plus:DI (reg/v/f:DI 155 [ b ])
(reg:DI 143 [ ivtmp.56 ])) [1 S32 A64])
] UNSPEC_MASKLOAD)
(const_vector:V4DF [
(const_double:DF 0.0 [0x0.0p+0]) repeated x4
])
(and:QI (reg:QI 125 [ mask__29.16 ])
(const_int 15 [0xf]))))
For maskstore, looks like it's already optimal(at least I can't make a
testcase).
So The patch only hanldes maskload.
gcc/ChangeLog:
PR target/103750
* config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for
maskload.
* config/i386/sse.md (*<avx512>_load<mode>mask_and15): New
define_insn_and_split.
(*<avx512>_load<mode>mask_and3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-pr103750-3.c: New test.