On Mon, 18 Sep 2017, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > The following is said to fix a 482.sphinx3 regression.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2017-09-18  Richard Biener  <rguent...@suse.de>
> >
> >     PR tree-optimization/82220
> >     * tree-vect-loop.c (vect_estimate_min_profitable_iters): Exclude
> >     epilogue niters from the min_profitable_iters compute.
> >
> > Index: gcc/tree-vect-loop.c
> > ===================================================================
> > --- gcc/tree-vect-loop.c    (revision 252907)
> > +++ gcc/tree-vect-loop.c    (working copy)
> > @@ -3663,8 +3663,8 @@ vect_estimate_min_profitable_iters (loop
> >            min_profitable_iters);
> >  
> >    /* We want the vectorized loop to execute at least once.  */
> > -  if (min_profitable_iters < (vf + peel_iters_prologue + 
> > peel_iters_epilogue))
> > -    min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
> > +  if (min_profitable_iters < (vf + peel_iters_prologue))
> > +    min_profitable_iters = vf + peel_iters_prologue;
> 
> Maybe we should still add 1 when peeling for gaps?

You mean add vf?  Yes, I guess so.

> Even adding the prologue count seems a bit weird if we've guessed it to
> be vf/2.  Wouldn't it be more profitable to vectorise with an iteration
> count of 1 vector iteration + 1 peeled iteration than 1 vector iteration
> + vf-1 peeled iterations, at least in percentage terms?  Was just wondering
> if we should only add peel_iters_prologue when npeels > 0.

Well, I specifically added the code to compensate for the case where
we only guess the count to vf/2.  I tried to make us reliably avoid
entering the vector path for bwaves -- which of course only works
with versioning -- where we have a runtime iteration count of 5 but
still end up peeling for alignment.

If we exactly know the iterations of the prologue we avoid peeling
for alignment already.

So yes, I made the guessed case err on the scalar side (which regressed
sphinx).  Before my patch it erred on the vector side by estimating
the prologue end epilogue iters as 0.  I suppose we should split the
handling to correctly handle the non-guessed case and have some
explicit code doing sth for the guessed one.

OTOH we might want to make the cost model check combined with the
prologue peeling properly handle the prologue/epilogue iterations.
I didn't check whether we do that (but IIRC the execution flow is
slightly sub-optimal here).

Richard.

Reply via email to