On Wed, 1 Mar 2017, Richard Sandiford wrote: > Sorry for the late reply, but: > > Richard Biener <rguent...@suse.de> writes: > > On Mon, 7 Nov 2016, Richard Biener wrote: > > > >> > >> Currently we force peeling for gaps whenever element overrun can occur > >> but for aligned accesses we know that the loads won't trap and thus > >> we can avoid this. > >> > >> Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect > >> some testsuite fallout here so didn't bother to invent a new testcase). > >> > >> Just in case somebody thinks the overrun is a bad idea in general > >> (even when not trapping). Like for ASAN or valgrind. > > > > This is what I applied. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > Richard. > [...] > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > > index 15aec21..c29e73d 100644 > > --- a/gcc/tree-vect-stmts.c > > +++ b/gcc/tree-vect-stmts.c > > @@ -1789,6 +1794,10 @@ get_group_load_store_type (gimple *stmt, tree > > vectype, bool slp, > > /* If there is a gap at the end of the group then these optimizations > > would access excess elements in the last iteration. */ > > bool would_overrun_p = (gap != 0); > > + /* If the access is aligned an overrun is fine. */ > > + if (would_overrun_p > > + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info))) > > + would_overrun_p = false; > > if (!STMT_VINFO_STRIDED_P (stmt_info) > > && (can_overrun_p || !would_overrun_p) > > && compare_step_with_zero (stmt) > 0) > > ...is this right for all cases? I think it only looks for single-vector > alignment, but the gap can in principle be vector-sized or larger, > at least for load-lanes. > > E.g. say we have a 128-bit vector of doubles in a group of size 4 > and a gap of 2 or 3. Even if the access itself is aligned, the group > spans two vectors and we have no guarantee that the second one > is mapped.
The check assumes that if aligned_access_p () returns true then the whole access is aligned in a way that it can't cross page boundaries. That's of course not the case if alignment is 16 bytes but the access will be a multiple of that. > I haven't been able to come up with a testcase though. We seem to be > overly conservative when computing alignments. Not sure if we can run into this with load-lanes given that bumps the vectorization factor. Also does load-lane work with gaps? I think that gap can never be larger than nunits-1 so it is by definition in the last "vector" independent of the VF. Classical gap case is for (i=0; i<n; ++i) { y[3*i + 0] = x[4*i + 0]; y[3*i + 1] = x[4*i + 1]; y[3*i + 2] = x[4*i + 2]; } where x has a gap of 1. You'll get VF of 12 for the above. Make the y's different streams and you should get the perfect case for load-lane: for (i=0; i<n; ++i) { y[i] = x[4*i + 0]; z[i] = x[4*i + 1]; w[i] = x[4*i + 2]; } previously we'd peel at least 4 iterations into the epilogue for the fear of accessing x[4*i + 3]. When x is V4SI aligned that's ok. Richard.