On Wed, 4 May 2022, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > The testcase shows that we can end up with a contiguous access across > > loop iterations but by means of permutations the elements accessed > > might only cover parts of a vector. In this case we end up with > > GROUP_GAP == 0 but still need to avoid accessing excess elements > > in the last loop iterations. Peeling for gaps is designed to cover > > this but a single scalar iteration might not cover all of the excess > > elements. The following ensures peeling for gaps is done in this > > situation and when that isn't sufficient because we need to peel > > more than one iteration (gcc.dg/vect/pr103116-2.c), fail the SLP > > vectorization. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > OK? > > LGTM.
Thanks, pushed. > In principle I think we could (in future) handle some of the > !multiple_p cases for variable-length vectors, but I don't think it > would ever trigger in practice yet, given the limited permutes we > support in that case. I wonder if for variable-length vectors the gap peeling can be better avoided by using a static mask? It would of course be repeated til the vector length, not sure if that's always possible for { 1, 1 ..., 0, 0, ..., } style masks of fixed known (sub-)lengths. Richard.