https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #11)
> (In reply to Hongtao Liu from comment #10)
> > clang generates
> > 
> > avx512:
> > f(int*, long):
> >         vmovdqu xmm0, xmmword ptr [rdi]
> >         vptestnmd       k0, xmm0, xmm0
> >         kortestb        k0, k0
> >         sete    al
> >         ret
> > 
> > avx2:
> > f(int*, long):
> >         vpxor   xmm0, xmm0, xmm0
> >         vpcmpeqd        xmm0, xmm0, xmmword ptr [rdi]
> >         vmovmskps       eax, xmm0
> >         test    eax, eax
> >         sete    al
> >         ret
> > 
> > Maybe GCC can reuse cstorem4 similar as cbranchm4 for those mask.
> 
> Yes, I have not tried to implement native vector mask reduction, instead
> I'm going via a data bool vector for the epilogue to use tested code.

For XOR cstorem4 isn't of help, but if we can get a scalar bit mask we
can use popcount&1 here.  Targets with separate vector modes for masks
can use reduc_{and,ior,xor}_scal but on x86 with either integer vector modes
or integer scalar modes that's going to be difficult.  A more explicit
reduc_mask_{and,ior,xor}_scal would be better there.

Reply via email to