https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 15 Oct 2025, liuhongt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639 > > --- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #14) > > (In reply to Hongtao Liu from comment #13) > > > > > > > > For XOR cstorem4 isn't of help, but if we can get a scalar bit mask we > > > > can use popcount&1 here. Targets with separate vector modes for masks > > > > can use reduc_{and,ior,xor}_scal but on x86 with either integer vector > > > > modes > > > > or integer scalar modes that's going to be difficult. A more explicit > > > > reduc_mask_{and,ior,xor}_scal would be better there. > > > > > > Yes, indeed, x86 can use vpmovmskb/kmov to convert vector mask to scalar > > > and > > > then popcnt&1, those implementation can all be done in the backend > > > expander. > > > > But ouch, for two and four bit masks we have all QImode, so > > reduc_mask_and_scal_qi doesn't work for them. For IOR and XOR it should > > we have vec_pack_sbool_trunc_qi and vec_pack_trunc_qi to handle similar issue. > vec_pack_sbool_trunc_qi accepts an additional const int operand to indicate > the > number of elements of output vector. Ah, yes - now I remember. I'll fumble the new optabs to do the same, wonder whether s/mask/sbool/ is better then, as 'mask' usually indicates masking, so will change that as well. I'll attach an updated patch later today.
