On Wed, 1 Mar 2017, Richard Sandiford wrote:

> Sorry for the late reply, but:
> 
> Richard Biener <rguent...@suse.de> writes:
> > On Mon, 7 Nov 2016, Richard Biener wrote:
> >
> >> 
> >> Currently we force peeling for gaps whenever element overrun can occur
> >> but for aligned accesses we know that the loads won't trap and thus
> >> we can avoid this.
> >> 
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect
> >> some testsuite fallout here so didn't bother to invent a new testcase).
> >> 
> >> Just in case somebody thinks the overrun is a bad idea in general
> >> (even when not trapping).  Like for ASAN or valgrind.
> >
> > This is what I applied.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > Richard.
> [...]
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index 15aec21..c29e73d 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree 
> > vectype, bool slp,
> >        /* If there is a gap at the end of the group then these optimizations
> >      would access excess elements in the last iteration.  */
> >        bool would_overrun_p = (gap != 0);
> > +      /* If the access is aligned an overrun is fine.  */
> > +      if (would_overrun_p
> > +     && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info)))
> > +   would_overrun_p = false;
> >        if (!STMT_VINFO_STRIDED_P (stmt_info)
> >       && (can_overrun_p || !would_overrun_p)
> >       && compare_step_with_zero (stmt) > 0)
> 
> ...is this right for all cases?  I think it only looks for single-vector
> alignment, but the gap can in principle be vector-sized or larger,
> at least for load-lanes.
>
> E.g. say we have a 128-bit vector of doubles in a group of size 4
> and a gap of 2 or 3.  Even if the access itself is aligned, the group
> spans two vectors and we have no guarantee that the second one
> is mapped.

The check assumes that if aligned_access_p () returns true then the
whole access is aligned in a way that it can't cross page boundaries.
That's of course not the case if alignment is 16 bytes but the access
will be a multiple of that.
 
> I haven't been able to come up with a testcase though.  We seem to be
> overly conservative when computing alignments.

Not sure if we can run into this with load-lanes given that bumps the
vectorization factor.  Also does load-lane work with gaps?

I think that gap can never be larger than nunits-1 so it is by definition
in the last "vector" independent of the VF.

Classical gap case is

for (i=0; i<n; ++i)
 {
   y[3*i + 0] = x[4*i + 0];
   y[3*i + 1] = x[4*i + 1];
   y[3*i + 2] = x[4*i + 2];
 }

where x has a gap of 1.  You'll get VF of 12 for the above.  Make
the y's different streams and you should get the perfect case for
load-lane:

for (i=0; i<n; ++i)
 {
   y[i] = x[4*i + 0];
   z[i] = x[4*i + 1];
   w[i] = x[4*i + 2];
 } 

previously we'd peel at least 4 iterations into the epilogue for
the fear of accessing x[4*i + 3].  When x is V4SI aligned that's
ok.

Richard.

Reply via email to