https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119876
Hongtao Liu <liuhongt at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Known to work| |16.0 Resolution|--- |FIXED Status|WAITING |RESOLVED --- Comment #8 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #7) > I think this is now fixed after r16-2298-g4d6c3f3b4fbf8c , can you confirm? Yes, it's fixed, now it generates test: .LFB0: .cfi_startproc movl $2, %eax vpxor %xmm5, %xmm5, %xmm5 vpbroadcastd %eax, %zmm4 vpternlogd $0xFF, %zmm5, %zmm5, %zmm5 xorl %eax, %eax vpxor %xmm6, %xmm6, %xmm6 vmovdqa32 %zmm5, %zmm2 vmovdqa32 %zmm6, %zmm3 vmovdqa32 %zmm4, %zmm1 .p2align 6 .p2align 4 .p2align 3 .L2: vmovdqa32 b(%rax), %zmm0 addq $64, %rax vpcmpd $6, %zmm3, %zmm0, %k1 vmovdqa32 %zmm1, %zmm0 vpsrld $31, %zmm2, %zmm0{%k1} vpaddd c-64(%rax), %zmm0, %zmm0 vmovdqa32 %zmm0, a-64(%rax) cmpq $3968, %rax jne .L2 I think it's better than clang, gcc uses vpsrld instead of vpblendmd to select between 1 and 2.