https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106081

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |96208

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
PR96208 is the SLP of non-grouped loads.  We now can convert short -> double
and we get with the grouped load hacked and -march=znver3:

.L2:
        vmovdqu (%rax), %ymm0
        vpermq  $27, -24(%rdi), %ymm10
        addq    $32, %rax
        subq    $32, %rdi
        vpshufb %ymm7, %ymm0, %ymm0
        vpermpd $85, %ymm10, %ymm9
        vpermpd $170, %ymm10, %ymm8
        vpermpd $255, %ymm10, %ymm6
        vpmovsxwd       %xmm0, %ymm1
        vextracti128    $0x1, %ymm0, %xmm0
        vbroadcastsd    %xmm10, %ymm10
        vcvtdq2pd       %xmm1, %ymm11
        vextracti128    $0x1, %ymm1, %xmm1
        vpmovsxwd       %xmm0, %ymm0
        vcvtdq2pd       %xmm1, %ymm1
        vfmadd231pd     %ymm10, %ymm11, %ymm5
        vfmadd231pd     %ymm9, %ymm1, %ymm2
        vcvtdq2pd       %xmm0, %ymm1
        vextracti128    $0x1, %ymm0, %xmm0
        vcvtdq2pd       %xmm0, %ymm0
        vfmadd231pd     %ymm8, %ymm1, %ymm4
        vfmadd231pd     %ymm6, %ymm0, %ymm3
        cmpq    %rax, %rdx
        jne     .L2

that is, the 'short' data type forces a higher VF to us and the splat
codegen I hacked in is sub-optimal still.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
[Bug 96208] non-grouped load can be SLP vectorized for 2-element vectors case

Reply via email to