https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>: https://gcc.gnu.org/g:0f3a743b688f4845e1798eed9b2e2284e891da11 commit r11-7233-g0f3a743b688f4845e1798eed9b2e2284e891da11 Author: Jakub Jelinek <ja...@redhat.com> Date: Sat Feb 13 10:32:16 2021 +0100 i386: Add combiner splitter to optimize V2SImode memory rotation [PR96166] Since the x86 backend enabled V2SImode vectorization (with TARGET_MMX_WITH_SSE), slp vectorization can kick in and emit movq (%rdi), %xmm1 pshufd $225, %xmm1, %xmm0 movq %xmm0, (%rdi) instead of rolq $32, (%rdi) we used to emit (or emit when slp vectorization is disabled). I think the rotate is both smaller and faster, so this patch adds a combiner splitter to optimize that back. 2021-02-13 Jakub Jelinek <ja...@redhat.com> PR target/96166 * config/i386/mmx.md (*mmx_pshufd_1): Add a combine splitter for swap of V2SImode elements in memory into DImode memory rotate by 32. * gcc.target/i386/pr96166.c: New test.