]> git.ipfire.org Git - thirdparty/gcc.git/commit
Also handle avx512 kmask & immediate 15 or 3 when VF is 4/2.
authorliuhongt <hongtao.liu@intel.com>
Tue, 3 Jun 2025 06:12:23 +0000 (14:12 +0800)
committerliuhongt <hongtao.liu@intel.com>
Mon, 9 Jun 2025 02:21:00 +0000 (10:21 +0800)
commitcdfa5fe03512f7ac5a293480f634df68fc973060
treecf13b1e7cb0c278e97e7889ffbb0d34653c5b752
parent8d745f6d70172132a594dcc650a6d489e7246eda
Also handle avx512 kmask & immediate 15 or 3 when VF is 4/2.

like r16-105-g599bca27dc37b3, the patch handles redunduant clean up of
upper-bits for maskload.
.i.e
Successfully matched this instruction:
(set (reg:V4DF 175)
    (vec_merge:V4DF (unspec:V4DF [
                (mem:V4DF (plus:DI (reg/v/f:DI 155 [ b ])
                        (reg:DI 143 [ ivtmp.56 ])) [1  S32 A64])
            ] UNSPEC_MASKLOAD)
        (const_vector:V4DF [
                (const_double:DF 0.0 [0x0.0p+0]) repeated x4
            ])
        (and:QI (reg:QI 125 [ mask__29.16 ])
            (const_int 15 [0xf]))))

For maskstore, looks like it's already optimal(at least I can't make a
testcase).
So The patch only hanldes maskload.

gcc/ChangeLog:

PR target/103750
* config/i386/i386.cc (ix86_rtx_costs): Adjust rtx_cost for
maskload.
* config/i386/sse.md (*<avx512>_load<mode>mask_and15): New
define_insn_and_split.
(*<avx512>_load<mode>mask_and3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-pr103750-3.c: New test.
gcc/config/i386/i386.cc
gcc/config/i386/sse.md
gcc/testsuite/gcc.target/i386/avx512f-pr103750-3.c [new file with mode: 0644]