https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125357

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <[email protected]>:

https://gcc.gnu.org/g:4446d3e1045bd3728f8e57ee4af85f7d1b190e4f

commit r17-617-g4446d3e1045bd3728f8e57ee4af85f7d1b190e4f
Author: Jakub Jelinek <[email protected]>
Date:   Wed May 20 08:49:06 2026 +0200

    i386: Use vpaddq + vpermilpd for some non-const permutations [PR125357]

    On Tue, May 19, 2026 at 10:30:16AM +0200, Jakub Jelinek wrote:
    > On Tue, May 19, 2026 at 10:51:37AM +0300, Alexander Monakov wrote:
    > > Thanks for looking at the issue, I really appreciate it. The same
problem
    > > exists with 64-bit lanes (V2DF/V2SI modes, we fail to utilize
vpermilpd).
    >
    > The control in that case is in bits 1 and 65 rather than 0 and 64.
    > So, in order to use vpermilpd for
    > __builtin_shuffle (v2di_or_v2df, v2di);
    > one would need to first shift the mask (or vpaddq with itself).
    > Though, that is still shorter than what we emit right now.

    The following seems to work for me.

    -       movl    $1, %eax
    -       vmovq   %rax, %xmm2
    -       vpunpcklqdq     %xmm2, %xmm2, %xmm2
    -       vpand   %xmm2, %xmm1, %xmm1
    -       vpsllq  $3, %xmm1, %xmm1
    -       vpshufb .LC1(%rip), %xmm1, %xmm1
    -       vpaddb  .LC2(%rip), %xmm1, %xmm1
    -       vpshufb %xmm1, %xmm0, %xmm0
    +       vpaddq  %xmm1, %xmm1, %xmm1
    +       vpermilpd       %xmm1, %xmm0, %xmm0

    for both V2DI and V2DF.

    2026-05-20  Jakub Jelinek  <[email protected]>

            PR target/125357
            * config/i386/i386-expand.cc (ix86_expand_vec_perm): For TARGET_AVX
            one_operand_shuffle handle also V2DImode and V2DFmode using
            vpaddq and vpermilpd.

            * gcc.target/i386/avx-pr125357-2.c: New test.
            * gcc.target/i386/avx2-pr125357-2.c: New test.

    Reviewed-by: Hongtao Liu <[email protected]>

Reply via email to