kmovd only uses port5 which is often the bottleneck of
performance. Also from latency perspective, spill and reload mostly
could be STLF or even MRN which only take 1 cycle.
So the patch increase move cost between gpr and mask to be the same as
gpr <-> sse register.
gcc/ChangeLog:
* config/i386/x86-tune-costs.h (skylake_cost): Increase gpr
<-> mask cost from 5 to 6.
(icelake_cost): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/spill_to_mask-1.c: New test.