https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639

--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 15 Oct 2025, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101639
> 
> --- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #14)
> > (In reply to Hongtao Liu from comment #13)
> > > > 
> > > > For XOR cstorem4 isn't of help, but if we can get a scalar bit mask we
> > > > can use popcount&1 here.  Targets with separate vector modes for masks
> > > > can use reduc_{and,ior,xor}_scal but on x86 with either integer vector 
> > > > modes
> > > > or integer scalar modes that's going to be difficult.  A more explicit
> > > > reduc_mask_{and,ior,xor}_scal would be better there.
> > > 
> > > Yes, indeed, x86 can use vpmovmskb/kmov to convert vector mask to scalar 
> > > and
> > > then  popcnt&1, those implementation can all be done in the backend 
> > > expander.
> > 
> > But ouch, for two and four bit masks we have all QImode, so
> > reduc_mask_and_scal_qi doesn't work for them.  For IOR and XOR it should
> 
> we have vec_pack_sbool_trunc_qi and vec_pack_trunc_qi to handle similar issue.
> vec_pack_sbool_trunc_qi accepts an additional const int operand to indicate 
> the
> number of elements of output vector.

Ah, yes - now I remember.  I'll fumble the new optabs to do the same,
wonder whether s/mask/sbool/ is better then, as 'mask' usually indicates
masking, so will change that as well.

I'll attach an updated patch later today.

Reply via email to