Re: [PATCH] Fix PR82220

Richard Sandiford Tue, 19 Sep 2017 01:29:30 -0700

Richard Biener <rguent...@suse.de> writes:
> On Mon, 18 Sep 2017, Richard Sandiford wrote:
>> Richard Biener <rguent...@suse.de> writes:
>> > The following is said to fix a 482.sphinx3 regression.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>> >
>> > Richard.
>> >
>> > 2017-09-18  Richard Biener  <rguent...@suse.de>
>> >
>> >    PR tree-optimization/82220
>> >    * tree-vect-loop.c (vect_estimate_min_profitable_iters): Exclude
>> >    epilogue niters from the min_profitable_iters compute.
>> >
>> > Index: gcc/tree-vect-loop.c
>> > ===================================================================
>> > --- gcc/tree-vect-loop.c   (revision 252907)
>> > +++ gcc/tree-vect-loop.c   (working copy)
>> > @@ -3663,8 +3663,8 @@ vect_estimate_min_profitable_iters (loop
>> >           min_profitable_iters);
>> >  
>> >    /* We want the vectorized loop to execute at least once.  */
>> > -  if (min_profitable_iters < (vf + peel_iters_prologue + 
>> > peel_iters_epilogue))
>> > -    min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
>> > +  if (min_profitable_iters < (vf + peel_iters_prologue))
>> > +    min_profitable_iters = vf + peel_iters_prologue;
>> 
>> Maybe we should still add 1 when peeling for gaps?
>
> You mean add vf?  Yes, I guess so.


I think 1 is enough, since peeling for gaps forces one final scalar
iteration to be done by the epilogue.  You only get vf iterations being
done by the epilogue if no iterations would have been done otherwise.

E.g. for vf=4 and no prologue peeling, a scalar iteration count of 5
would be 1 vector iteration and 1 epilogue iteration even with peeling
for gaps.  A scalar iteration count of 8 would be 1 vector iteration
and 4 epilogue iterations.  But a scalar iteration count of 4 would
not use the vector loop at all.  In that case 5 seems like the right
minimum.

>> Even adding the prologue count seems a bit weird if we've guessed it to
>> be vf/2.  Wouldn't it be more profitable to vectorise with an iteration
>> count of 1 vector iteration + 1 peeled iteration than 1 vector iteration
>> + vf-1 peeled iterations, at least in percentage terms?  Was just wondering
>> if we should only add peel_iters_prologue when npeels > 0.
>
> Well, I specifically added the code to compensate for the case where
> we only guess the count to vf/2.  I tried to make us reliably avoid
> entering the vector path for bwaves -- which of course only works
> with versioning -- where we have a runtime iteration count of 5 but
> still end up peeling for alignment.
>
> If we exactly know the iterations of the prologue we avoid peeling
> for alignment already.

Ah, OK.  For bwaves, did we peel more than one iteration for
alignment and so skip the vector loop entirely?  Or did we still
use the vector loop, but end up with something that was still
slower because of the extra overhead?

> So yes, I made the guessed case err on the scalar side (which regressed
> sphinx).  Before my patch it erred on the vector side by estimating
> the prologue end epilogue iters as 0.  I suppose we should split the
> handling to correctly handle the non-guessed case and have some
> explicit code doing sth for the guessed one.
>
> OTOH we might want to make the cost model check combined with the
> prologue peeling properly handle the prologue/epilogue iterations.
> I didn't check whether we do that (but IIRC the execution flow is
> slightly sub-optimal here).

Something like adding the number of peels to the threshold at runtime?
Yeah, that sounds like it might be better.

One problem I remember hitting was that, even if we do calculate
a reasonable profitability threshold, we'd do the check in the same
block as the versioning check, and so pay the full penalty of the
versioning check regardless.  I.e. it's "unprofitable_p | alias_p"
rather than "unprofitable_p || alias_p".  Not sure whether that's
a different problem from bwaves though.

Thanks,
Richard

Re: [PATCH] Fix PR82220

Reply via email to