From: Arnd Bergmann Date: Thu, 11 Jun 2026 12:59:39 +0000 (+0200) Subject: lib/crypto: gf128hash: mark clmul32() as noinline_for_stack X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=065f978a0e015c4dd9f536f5c08078a37f5509c1;p=thirdparty%2Flinux.git lib/crypto: gf128hash: mark clmul32() as noinline_for_stack During randconfig testing, I came across a lot of warnings for the newly added carryless multiplication function triggering excessive stack usage from spilling temporary variables to the stack: lib/crypto/gf128hash.c:166:1: error: stack frame size (1192) exceeds limit (1024) in 'polyval_mul_generic' [-Werror,-Wframe-larger-than] In addition to the possible risk of overflowing the kernel stack, the generated object code surely performs very poorly. This only happens on architectures that don't provide uint128_t (which should be all 32-bit architectures on modern compilers), but though I tested random x86 and arm configs, I only saw this with arm's CONFIG_THUMB2_KERNEL, which adds more pressure to the register allocator. The testing was done using clang-22, I don't know if gcc has the same problem. Marking clmul32() as noinline_for_stack experimentally shows all of the affected builds to completely solve the problem, reducing the stack usage to a few bytes as expected. Since u64 arithmetic frequently leads to compilers badly optimizing 32-bit targets, keeping clmul32 out of line is likely to help on other 32-bit configurations as well when they run into this problem, though it may also result in a small performance degradation in configurations that would benefit from inlining. Signed-off-by: Arnd Bergmann Link: https://patch.msgid.link/20260611125952.3387258-1-arnd@kernel.org Signed-off-by: Eric Biggers --- diff --git a/lib/crypto/gf128hash.c b/lib/crypto/gf128hash.c index 2650603d8ba85..8dcdf5ec98be0 100644 --- a/lib/crypto/gf128hash.c +++ b/lib/crypto/gf128hash.c @@ -109,7 +109,7 @@ static void clmul64(u64 a, u64 b, u64 *out_lo, u64 *out_hi) #else /* CONFIG_ARCH_SUPPORTS_INT128 */ /* Do a 32 x 32 => 64 bit carryless multiplication. */ -static u64 clmul32(u32 a, u32 b) +static noinline_for_stack u64 clmul32(u32 a, u32 b) { /* * With 32-bit multiplicands and one term every 4 bits, there are up to