This one line change resolves the failure of gcc.target/i386/rotate-2.c
when compiled with -march=cascadelake triggered by recent STV improvements.
https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716996.html
The decision of whether to perform STV is finely balanced, and affected
by the microarchitecture's timings/costs, but in this case the underlying
issue appears to be the parameterized cost for performing a 128-bit
rotation by a constant in SSE registers. Depending upon the number
of bits to rotate by, SSE requires either 1 or 2 shuffles, followed
by a left shift, a right shift and an any_or_plus to combine the result.
This is therefore 4 or 5 instructions, but currently returns
COSTS_N_INSNS(1) instead of COSTS_N_INSNS(4) [probably a typo].
As an aside, it might be more useful for this gain to based on latency;
as both the shuffles and the shifts can each be performed in parallel,
so a reasonable vcost may therefore be COSTS_N_INSNS(3), but such fine
tuning might require microbenchmarking. I mention it here just in case
using COSTS_N_INSNS(4) is bisected as a performance regression.
2026-05-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
the cost of a 128-bit rotation to be 4 (or 5) instructions.
else if (op1val > 32 && op1val < 96)
vcost = COSTS_N_INSNS (5);
else
- vcost = COSTS_N_INSNS (1);
+ vcost = COSTS_N_INSNS (4);
}
igain = scost - vcost;
break;