From: Roger Sayle Date: Thu, 28 May 2026 19:54:17 +0000 (+0100) Subject: x86_64 SSE: Tweak/correct STV cost of 128-bit rotate by constant. X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=0caa152ba34d2cf53a6555455fa10d6130fd7dc5;p=thirdparty%2Fgcc.git x86_64 SSE: Tweak/correct STV cost of 128-bit rotate by constant. This one line change resolves the failure of gcc.target/i386/rotate-2.c when compiled with -march=cascadelake triggered by recent STV improvements. https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716996.html The decision of whether to perform STV is finely balanced, and affected by the microarchitecture's timings/costs, but in this case the underlying issue appears to be the parameterized cost for performing a 128-bit rotation by a constant in SSE registers. Depending upon the number of bits to rotate by, SSE requires either 1 or 2 shuffles, followed by a left shift, a right shift and an any_or_plus to combine the result. This is therefore 4 or 5 instructions, but currently returns COSTS_N_INSNS(1) instead of COSTS_N_INSNS(4) [probably a typo]. As an aside, it might be more useful for this gain to based on latency; as both the shuffles and the shifts can each be performed in parallel, so a reasonable vcost may therefore be COSTS_N_INSNS(3), but such fine tuning might require microbenchmarking. I mention it here just in case using COSTS_N_INSNS(4) is bisected as a performance regression. 2026-05-28 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Tweak the cost of a 128-bit rotation to be 4 (or 5) instructions. --- diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 4f3f50a6524..0694811e9da 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -1867,7 +1867,7 @@ timode_scalar_chain::compute_convert_gain () else if (op1val > 32 && op1val < 96) vcost = COSTS_N_INSNS (5); else - vcost = COSTS_N_INSNS (1); + vcost = COSTS_N_INSNS (4); } igain = scost - vcost; break;