Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

Jan Hubicka Mon, 20 Feb 2017 07:17:44 -0800

> > +  /* Without profile feedback, loops for which we do not know a better 
> > estimate
> > +     are assumed to roll 10 times.  When we unroll such loop, it appears to
> > +     roll too little, and it may even seem to be cold.  To avoid this, we
> > +     ensure that the created loop appears to roll at least 5 times (but at
> > +     most as many times as before unrolling).  */
> > +  if (new_est_niter < 5)
> > +    {
> > +      if (est_niter < 5)
> > +       new_est_niter = est_niter;
> > +      else
> > +       new_est_niter = 5;
> > +    }
> > +
> > +  return new_est_niter;
> > +}
> >
> > I see this code is pre-existing, but please extend it to test if
> > loop->header->count is non-zero.  Even if we do not have idea about loop
> > iteration count estimate we may end up predicting more than 10 iterations 
> > when
> > predictors combine that way.
> If I use expected_loop_iterations_unbounded, then do I need to handle
> loop->header->count explicitly here?  I suppose not because it has
> below code already:
> 
>   /* If we have no profile at all, use AVG_LOOP_NITER.  */
>   if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
>     expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
>   else if (loop->latch && (loop->latch->count || loop->header->count))
>     {
>       gcov_type count_in, count_latch;
>       //...


expected_loop_iterations_unbounded avoids capping the # of iterations comming
from profile feedback (so if loop iterates 1000000 it retns 1000000 instead of
10000). Testing loop->header->count for non-zero tests if the loop was ever
entered in the train run so it is easy test if you have any profile feedback
for a given loop available.

I profile feedback is there, then making usre that new_est_niter is at least
5 will only make feedback less acurate.  I am not sure if we vectorize loop
that will iterate only 1...4 times after unrolling, but I guess we could,
and it would be desirable to peel it afterwards.
> 
> The second question is: looks like it only takes latch->count into
> consideration when PROFILE_ABSENT.  But according to your comments, we
> could have nonzero count sometime?

profile_status_for_fn (cfun) == PROFILE_ABSENT is set only when there is no
profile esitmate at all - i.e. you compile with
-fno-guess-branch-probabilities.  We don't really use that path very often
but in that case you have no data to guess your profile from.

> 
> >
> > Perhaps testing estimated-loop_iterations would also make sense, but that
> > could be dealt with incrementally.
> >
> > +static void
> > +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
> > +{
> > +  unsigned freq_h = loop->header->frequency;
> > +  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
> > +  /* Reduce loop iterations by the vectorization factor.  */
> > +  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
> > +
> > +  if (freq_h != 0)
> > +    scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
> > +
> > I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
> > freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
> > spreading mistakes across cfg.
> >
> > But I guess here it is sort of safe because vectorized loops are simple.
> > You can't just scale down the existing counts/frequencies by vf, because the
> > entry edge frequency was adjusted.
> I am not 100% follow here, it looks the code avoids changing frequency
> counter for preheader/exit edge, otherwise we would need to change all
> counters dominated by them?

I was just wondering what is wrong with taking the existing frequencies/counts
the loop body has and dividing them all by the unroll factor.  This is correct
if you ignore the versioning. With versioning I guess you want to scale by
the unroll factor and also subtract frequencies/counts that was acocunted to
the other versions of the loop, right?
> >
> > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps 
> > you
> > need to compute it before you start chanigng the CFG by peeling proplogue?
> Peeling for prologue doesn't change profiling information of
> vect_loop, it is the skip edge from before loop to preferred epilogue
> loop that will change profile counters.  I guess here exists a dilemma
> that niter_for_unrolled_loop is for loop after peeling for prologue?

expected_loop_iterations_unbounded calculates number of iteations by computing
sum of frequencies of edges entering the loop and comparing it to the frequency
of loop header.  While peeling the prologue, you split the preheader edge and
adjust frequency of the new preheader BB of the loop to be vectorized.  I think
that will adjust the #of iterations estimate.

Honza
> 
> Thanks,
> bin
> >
> > Finally if freq_e is 0, all frequencies and counts will be probably dropped 
> > to
> > 0.  What about determining fraction by counts if they are available?
> >
> > Otherwise the patch looks good and thanks a lot for working on this!
> >
> > Honza
> >
> >> >
> >> > gcc/testsuite/ChangeLog
> >> > 2017-02-16  Bin Cheng  <bin.ch...@arm.com>
> >> >
> >> >         PR tree-optimization/77536
> >> >         * gcc.dg/vect/pr79347.c: Revise testing string.

Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

Reply via email to