With r16-531 we've regressed
FAIL: gcc.target/i386/pr108938-3.c scan-assembler-times bswap[\t ]+ 3
on ia32 and also made 2 separate regressions in the same testcase
on x86_64-linux (which in scan-assembler-times cancel out; previously
we were generating one 64-bit bswap + rotate in one function and
one 32-bit bswap + rotate in another function, now we emit 2 32-bit bswaps
in the first one and really horrible code in the second one).
The following patch fixes the latter function by emitting 32-bit bswap
+ 32-bit rotate on both ia32 and x86_64. This fixes the
above FAIL (and introduces
FAIL: gcc.target/i386/pr108938-3.c scan-assembler-times bswap[\t ]+ 2
on x86_64).
The problem is that the vectorizer now uses VEC_PACK_TRUNC_EXPR and
bswap/store_merging was only able to handle vectors in a CONSTRUCTOR.
The patch adds handling of VEC_PACK_TRUNC_EXPR if its operands are
CONSTRUCTORs. Without a testcase, I wasn't confident enough to write
BYTES_BIG_ENDIAN support, for CONSTRUCTOR { A, B, C, D } we make
DCBA out of it on little endian but ABCD on big endian and that would
need to be combined with picking up the most significant halves of each
element.
I'll look incrementally at the other function.
2026-02-17 Jakub Jelinek <jakub@redhat.com>
PR target/120233
* gimple-ssa-store-merging.cc (find_bswap_or_nop_2): New function.
(find_bswap_or_nop): Move CONSTRUCTOR handling to above function,
call it instead of find_bswap_or_nop_1.
(bswap_replace): Handle VEC_PACK_TRUNC_EXPR like CONSTRUCTOR.
(maybe_optimize_vector_constructor): Likewise.
(pass_optimize_bswap::execute): Likewise.
(get_status_for_store_merging): Likewise.
(pass_store_merging::execute): Likewise.