Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

Bin.Cheng Mon, 20 Feb 2017 09:15:32 -0800

On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka <[email protected]> wrote:
>> BTW, if we use gcov_type in calculation from 
>> expected_loop_iterations_unbounded,
>> how should we adjust the number for calling scale_loop_frequencies
>> which has int type?  In extreme case, gcov_type could be out of int's
>> range, we have to cap the value anyway.  But yes, 10000 in
>> expect_loop_iterations is too small.
>
> What I usually do is to use fixed point math in this case (based on 
> REG_BR_PROB_BASE).
> Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type 
> converting
> to int. Because you should be just decreasing the counts, it won't overflow 
> and because
> the decarese will be in range, say 2...256 times, it should also be 
> sufficiently
> precise.
>
> Be careful to avoid overflow of gcov type - it is not safe to multiply two
> counts in 64bit math because each count can be more than 2^32.  (next stage1 I
> plan to switch most of this to sreals that will make this easier)
>
>> >> > But I guess here it is sort of safe because vectorized loops are simple.
>> >> > You can't just scale down the existing counts/frequencies by vf, 
>> >> > because the
>> >> > entry edge frequency was adjusted.
>> >> I am not 100% follow here, it looks the code avoids changing frequency
>> >> counter for preheader/exit edge, otherwise we would need to change all
>> >> counters dominated by them?
>> >
>> > I was just wondering what is wrong with taking the existing 
>> > frequencies/counts
>> > the loop body has and dividing them all by the unroll factor.  This is 
>> > correct
>> > if you ignore the versioning. With versioning I guess you want to scale by
>> > the unroll factor and also subtract frequencies/counts that was acocunted 
>> > to
>> > the other versions of the loop, right?
>> IIUC, for (vectorized) loop header, it's frequency is calculated by:
>>           freq_header = freq_preheader + freq_latch
>> and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
>>           freq_header = (freq_preheader + freq_latch) / VF
>> which is wrong.  Especially if the loop is vectorized by large VF
>> (=16) and we normally assume niter (=10) without profiling
>> information, it results not only mismatch, but also
>> (loop->header->frequency < loop->preheader->frequency).  In fact, if
>> we have accurate niter information, the counter should be:
>>           freq_header = freq_preheader + niter * freq_preheader
>
> You are right. We need to compensate for the change of probability of the loop
> exit edge.
>>
>> >> >
>> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so 
>> >> > perhaps you
>> >> > need to compute it before you start chanigng the CFG by peeling 
>> >> > proplogue?
>> >> Peeling for prologue doesn't change profiling information of
>> >> vect_loop, it is the skip edge from before loop to preferred epilogue
>> >> loop that will change profile counters.  I guess here exists a dilemma
>> >> that niter_for_unrolled_loop is for loop after peeling for prologue?
>> >
>> > expected_loop_iterations_unbounded calculates number of iteations by 
>> > computing
>> > sum of frequencies of edges entering the loop and comparing it to the 
>> > frequency
>> > of loop header.  While peeling the prologue, you split the preheader edge 
>> > and
>> > adjust frequency of the new preheader BB of the loop to be vectorized.  I 
>> > think
>> > that will adjust the #of iterations estimate.
>> It's not the case now I think.  one motivation of new vect_do_peeling
>> is to avoid niter checks between prologue and vector loop.  Once
>> prologue loop is entered or checked, the vector loop must be executed
>> unconditionally.  So the preheaderof vector loop has consistent
>> frequency counters now.  The niter check on whether vector loop should
>> be executed is now merged with cost check before prologue, and in the
>> future I think this can be further merged if loop versioning is
>> needed.
>
> Originally you have
>
>   loop_preheader
>        |
>        v
>    loop_header
>
> and the ratio of the two BB frequencies is the loop iteration count. Then you
> do something like:
>
>   orig_loop_preheader
>        |
>        v
>    loop_prologue -----> scalar_version_of_loop
>        |
>        v
>  new_loop_preheader
>        |
>        v
>    loop_header
It's like:


   orig_loop_preheader
        |
        v
    check_on_niter -----> scalar_version_of_loop (i.e, epilog_loop)
        |
        v
    loop_prologue
        |
        v
  new_loop_preheader
        |
        v
    loop_header

Yes, the preheader/header need to be consistent here, and it is now.
Thanks for explaining.

> At some point, you need to update new_loop_preheader frequency/count
> to reflect the fact that with some probability the loop_prologue avoids
> the vectorized loop.  Once you do it and if you don't scale frequency of
> loop_header you will make expect_loop_iterations to return higher value
> than previously.
>
> So at the time you are calling it, you need to be sure that the loop_header
> and its preheader frequences was both adjusted by same factor.  Or you need
> to call it early before you start hacking on the CFG and its profile.
>
> Pehaps currently it is safe, because your peeling code is also scaling
> the loop profiles.

I will revise the patch according to this discussion.

Thanks,
bin
>
> Honza
>>
>> Thanks,
>> bin
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> bin
>> >> >
>> >> > Finally if freq_e is 0, all frequencies and counts will be probably 
>> >> > dropped to
>> >> > 0.  What about determining fraction by counts if they are available?
>> >> >
>> >> > Otherwise the patch looks good and thanks a lot for working on this!
>> >> >
>> >> > Honza
>> >> >
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog
>> >> >> > 2017-02-16  Bin Cheng  <[email protected]>
>> >> >> >
>> >> >> >         PR tree-optimization/77536
>> >> >> >         * gcc.dg/vect/pr79347.c: Revise testing string.

Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

Reply via email to