On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka <[email protected]> wrote:
>> BTW, if we use gcov_type in calculation from
>> expected_loop_iterations_unbounded,
>> how should we adjust the number for calling scale_loop_frequencies
>> which has int type? In extreme case, gcov_type could be out of int's
>> range, we have to cap the value anyway. But yes, 10000 in
>> expect_loop_iterations is too small.
>
> What I usually do is to use fixed point math in this case (based on
> REG_BR_PROB_BASE).
> Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type
> converting
> to int. Because you should be just decreasing the counts, it won't overflow
> and because
> the decarese will be in range, say 2...256 times, it should also be
> sufficiently
> precise.
>
> Be careful to avoid overflow of gcov type - it is not safe to multiply two
> counts in 64bit math because each count can be more than 2^32. (next stage1 I
> plan to switch most of this to sreals that will make this easier)
>
>> >> > But I guess here it is sort of safe because vectorized loops are simple.
>> >> > You can't just scale down the existing counts/frequencies by vf,
>> >> > because the
>> >> > entry edge frequency was adjusted.
>> >> I am not 100% follow here, it looks the code avoids changing frequency
>> >> counter for preheader/exit edge, otherwise we would need to change all
>> >> counters dominated by them?
>> >
>> > I was just wondering what is wrong with taking the existing
>> > frequencies/counts
>> > the loop body has and dividing them all by the unroll factor. This is
>> > correct
>> > if you ignore the versioning. With versioning I guess you want to scale by
>> > the unroll factor and also subtract frequencies/counts that was acocunted
>> > to
>> > the other versions of the loop, right?
>> IIUC, for (vectorized) loop header, it's frequency is calculated by:
>> freq_header = freq_preheader + freq_latch
>> and freq_latch = (niter * freq_preheader). Simply scaling it by VF gives:
>> freq_header = (freq_preheader + freq_latch) / VF
>> which is wrong. Especially if the loop is vectorized by large VF
>> (=16) and we normally assume niter (=10) without profiling
>> information, it results not only mismatch, but also
>> (loop->header->frequency < loop->preheader->frequency). In fact, if
>> we have accurate niter information, the counter should be:
>> freq_header = freq_preheader + niter * freq_preheader
>
> You are right. We need to compensate for the change of probability of the loop
> exit edge.
>>
>> >> >
>> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so
>> >> > perhaps you
>> >> > need to compute it before you start chanigng the CFG by peeling
>> >> > proplogue?
>> >> Peeling for prologue doesn't change profiling information of
>> >> vect_loop, it is the skip edge from before loop to preferred epilogue
>> >> loop that will change profile counters. I guess here exists a dilemma
>> >> that niter_for_unrolled_loop is for loop after peeling for prologue?
>> >
>> > expected_loop_iterations_unbounded calculates number of iteations by
>> > computing
>> > sum of frequencies of edges entering the loop and comparing it to the
>> > frequency
>> > of loop header. While peeling the prologue, you split the preheader edge
>> > and
>> > adjust frequency of the new preheader BB of the loop to be vectorized. I
>> > think
>> > that will adjust the #of iterations estimate.
>> It's not the case now I think. one motivation of new vect_do_peeling
>> is to avoid niter checks between prologue and vector loop. Once
>> prologue loop is entered or checked, the vector loop must be executed
>> unconditionally. So the preheaderof vector loop has consistent
>> frequency counters now. The niter check on whether vector loop should
>> be executed is now merged with cost check before prologue, and in the
>> future I think this can be further merged if loop versioning is
>> needed.
>
> Originally you have
>
> loop_preheader
> |
> v
> loop_header
>
> and the ratio of the two BB frequencies is the loop iteration count. Then you
> do something like:
>
> orig_loop_preheader
> |
> v
> loop_prologue -----> scalar_version_of_loop
> |
> v
> new_loop_preheader
> |
> v
> loop_header
It's like:
orig_loop_preheader
|
v
check_on_niter -----> scalar_version_of_loop (i.e, epilog_loop)
|
v
loop_prologue
|
v
new_loop_preheader
|
v
loop_header
Yes, the preheader/header need to be consistent here, and it is now.
Thanks for explaining.
> At some point, you need to update new_loop_preheader frequency/count
> to reflect the fact that with some probability the loop_prologue avoids
> the vectorized loop. Once you do it and if you don't scale frequency of
> loop_header you will make expect_loop_iterations to return higher value
> than previously.
>
> So at the time you are calling it, you need to be sure that the loop_header
> and its preheader frequences was both adjusted by same factor. Or you need
> to call it early before you start hacking on the CFG and its profile.
>
> Pehaps currently it is safe, because your peeling code is also scaling
> the loop profiles.
I will revise the patch according to this discussion.
Thanks,
bin
>
> Honza
>>
>> Thanks,
>> bin
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> bin
>> >> >
>> >> > Finally if freq_e is 0, all frequencies and counts will be probably
>> >> > dropped to
>> >> > 0. What about determining fraction by counts if they are available?
>> >> >
>> >> > Otherwise the patch looks good and thanks a lot for working on this!
>> >> >
>> >> > Honza
>> >> >
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog
>> >> >> > 2017-02-16 Bin Cheng <[email protected]>
>> >> >> >
>> >> >> > PR tree-optimization/77536
>> >> >> > * gcc.dg/vect/pr79347.c: Revise testing string.