-ftree-slp-vectorize turns ROL into a mess

cvs-commit at gcc dot gnu.org via Gcc-bugs Sat, 13 Feb 2021 01:33:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96166


--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:0f3a743b688f4845e1798eed9b2e2284e891da11

commit r11-7233-g0f3a743b688f4845e1798eed9b2e2284e891da11
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Sat Feb 13 10:32:16 2021 +0100

    i386: Add combiner splitter to optimize V2SImode memory rotation [PR96166]

    Since the x86 backend enabled V2SImode vectorization (with
    TARGET_MMX_WITH_SSE), slp vectorization can kick in and emit
            movq    (%rdi), %xmm1
            pshufd  $225, %xmm1, %xmm0
            movq    %xmm0, (%rdi)
    instead of
            rolq    $32, (%rdi)
    we used to emit (or emit when slp vectorization is disabled).
    I think the rotate is both smaller and faster, so this patch adds
    a combiner splitter to optimize that back.

    2021-02-13  Jakub Jelinek  <ja...@redhat.com>

            PR target/96166
            * config/i386/mmx.md (*mmx_pshufd_1): Add a combine splitter for
            swap of V2SImode elements in memory into DImode memory rotate by
32.

            * gcc.target/i386/pr96166.c: New test.

[Bug target/96166] [10/11 Regression] -O3/-ftree-slp-vectorize turns ROL into a mess

Reply via email to