https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |16.0
         Resolution|---                         |FIXED
             Status|WAITING                     |RESOLVED

--- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #7)
> I think this is now fixed after r16-2298-g4d6c3f3b4fbf8c , can you confirm?

Yes, it's fixed, now it generates

test:
.LFB0:
        .cfi_startproc
        movl    $2, %eax
        vpxor   %xmm5, %xmm5, %xmm5
        vpbroadcastd    %eax, %zmm4
        vpternlogd      $0xFF, %zmm5, %zmm5, %zmm5
        xorl    %eax, %eax
        vpxor   %xmm6, %xmm6, %xmm6
        vmovdqa32       %zmm5, %zmm2
        vmovdqa32       %zmm6, %zmm3
        vmovdqa32       %zmm4, %zmm1
        .p2align 6
        .p2align 4
        .p2align 3
.L2:
        vmovdqa32       b(%rax), %zmm0
        addq    $64, %rax
        vpcmpd  $6, %zmm3, %zmm0, %k1
        vmovdqa32       %zmm1, %zmm0
        vpsrld  $31, %zmm2, %zmm0{%k1}
        vpaddd  c-64(%rax), %zmm0, %zmm0
        vmovdqa32       %zmm0, a-64(%rax)
        cmpq    $3968, %rax
        jne     .L2

I think it's better than clang, gcc uses vpsrld instead of vpblendmd to select
between 1 and 2.

Reply via email to