From: Nathan Moinvaziri Date: Sun, 11 Jan 2026 19:32:44 +0000 (-0800) Subject: Eliminate extra vmovdqu instruction folding xmm into zmm. X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=26ab3c5e29afe63a2435fa0b1064449614d45bb7;p=thirdparty%2Fzlib-ng.git Eliminate extra vmovdqu instruction folding xmm into zmm. Fixed by using _mm512_castsi128_si512() and removing redundant insert. --- diff --git a/arch/x86/crc32_pclmulqdq_tpl.h b/arch/x86/crc32_pclmulqdq_tpl.h index e951c0579..a7b8edfdd 100644 --- a/arch/x86/crc32_pclmulqdq_tpl.h +++ b/arch/x86/crc32_pclmulqdq_tpl.h @@ -264,7 +264,7 @@ Z_FORCEINLINE static uint32_t crc32_copy_impl(uint32_t crc, uint8_t *dst, const } // Fold existing xmm state into first 64 bytes - zmm_t0 = _mm512_inserti32x4(_mm512_setzero_si512(), xmm_crc0, 0); + zmm_t0 = _mm512_castsi128_si512(xmm_crc0); zmm_t0 = _mm512_inserti32x4(zmm_t0, xmm_crc1, 1); zmm_t0 = _mm512_inserti32x4(zmm_t0, xmm_crc2, 2); zmm_t0 = _mm512_inserti32x4(zmm_t0, xmm_crc3, 3);