Re: [PATCH 2/2][RFC] Add loop masking support for x86

Richard Biener Tue, 20 Jul 2021 23:18:05 -0700

On Tue, 20 Jul 2021, Richard Biener wrote:

> On Thu, 15 Jul 2021, Richard Sandiford wrote:
> 
> > Richard Biener <rguent...@suse.de> writes:
> > > The following extends the existing loop masking support using
> > > SVE WHILE_ULT to x86 by proving an alternate way to produce the
> > > mask using VEC_COND_EXPRs.  So with --param vect-partial-vector-usage
> > > you can now enable masked vectorized epilogues (=1) or fully
> > > masked vector loops (=2).
> > 
> > As mentioned on IRC, WHILE_ULT is supposed to ensure that every
> > element after the first zero is also zero.  That happens naturally
> > for power-of-2 vectors if the start index is a multiple of the VF.
> > (And at the moment, variable-length vectors are the only way of
> > supporting non-power-of-2 vectors.)
> > 
> > This probably works fine for =2 and =1 as things stand, since the
> > vector IVs always start at zero.  But if in future we have a single
> > IV counting scalar iterations, and use it even for peeled prologue
> > iterations, we could end up with a situation where the approximation
> > is no longer safe.
> > 
> > E.g. suppose we had a uint32_t scalar IV with a limit of (uint32_t)-3.
> > If we peeled 2 iterations for alignment and then had a VF of 8,
> > the final vector would have a start index of (uint32_t)-6 and the
> > vector would be { -1, -1, -1, 0, 0, 0, -1, -1 }.
> > 
> > So I think it would be safer to handle this as an alternative to
> > using while, rather than as a direct emulation, so that we can take
> > the extra restrictions into account.  Alternatively, we could probably
> > do { 0, 1, 2, ... } < { end - start, end - start, ... }.
> 
> That doesn't end up working since in the last iteration with a
> non-zero mask we'll compare with all underflowed values (start
> will be > end).  So while we compute a correct mask we cannot use
> that for loop control anymore.


Of course I can just use a signed comparison here (until we get
V128QI and a QImode iterator).

Richard.

Re: [PATCH 2/2][RFC] Add loop masking support for x86

Reply via email to