https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #10) > clang generates > > avx512: > f(int*, long): > vmovdqu xmm0, xmmword ptr [rdi] > vptestnmd k0, xmm0, xmm0 > kortestb k0, k0 > sete al > ret > > avx2: > f(int*, long): > vpxor xmm0, xmm0, xmm0 > vpcmpeqd xmm0, xmm0, xmmword ptr [rdi] > vmovmskps eax, xmm0 > test eax, eax > sete al > ret > > Maybe GCC can reuse cstorem4 similar as cbranchm4 for those mask. Yes, I have not tried to implement native vector mask reduction, instead I'm going via a data bool vector for the epilogue to use tested code.
