https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88547
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- For 64-byte vectors, we emit vpcmpgtb %zmm1, %zmm0, %k1 vpxor %xmm1, %xmm1, %xmm1 vpternlogd $0xFF, %zmm0, %zmm0, %zmm0 vmovdqu8 %zmm1, %zmm0{%k1} for f1, perhaps it would be better to emit: vpcmpgtb %zmm1, %zmm0, %k1 knotq %k1, %k1 vpmovm2b %k1, %zmm0 ?