On Wed, 4 May 2022, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > The testcase shows that we can end up with a contiguous access across
> > loop iterations but by means of permutations the elements accessed
> > might only cover parts of a vector.  In this case we end up with
> > GROUP_GAP == 0 but still need to avoid accessing excess elements
> > in the last loop iterations.  Peeling for gaps is designed to cover
> > this but a single scalar iteration might not cover all of the excess
> > elements.  The following ensures peeling for gaps is done in this
> > situation and when that isn't sufficient because we need to peel
> > more than one iteration (gcc.dg/vect/pr103116-2.c), fail the SLP
> > vectorization.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> LGTM.

Thanks, pushed.

> In principle I think we could (in future) handle some of the
> !multiple_p cases for variable-length vectors, but I don't think it
> would ever trigger in practice yet, given the limited permutes we
> support in that case.

I wonder if for variable-length vectors the gap peeling can be
better avoided by using a static mask?  It would of course be
repeated til the vector length, not sure if that's always
possible for { 1, 1 ..., 0, 0, ..., } style masks of fixed
known (sub-)lengths.

Richard.

Reply via email to