https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #27 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> >
> > seems it's tricky to mate a != 0 compare with the all-zero vptest optimally,
> It could be possibly handled in combine, we already has ptest for CCZ and
> CCC separately, if only CCZ is cared, then (unspec:CCZ (eq (eq op const0)
> const0) unspec_ptest) can be simplified.
for reduc_mask_ior, it can be further optimized to below under avx2.
.cfi_startproc
vmovdqu (%rdi), %ymm0
vptest %ymm0, %ymm0
setne %al
vzeroupper
But for reduc_mask_and, it's
.cfi_startproc
vpxor %xmm1, %xmm1, %xmm1
vpcmpeqd (%rdi), %ymm1, %ymm0
vpcmpeqd %ymm1, %ymm0, %ymm0
vpcmpeqd %ymm1, %ymm1, %ymm1
vpxor %ymm1, %ymm0, %ymm0
vptest %ymm0, %ymm0
sete %al
vs clang
vpxor xmm0, xmm0, xmm0
vpcmpeqd ymm0, ymm0, ymmword ptr [rdi]
vmovmskps eax, ymm0
test eax, eax
sete al
hard to fix it in the combine.