On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >> BTW, if we use gcov_type in calculation from >> expected_loop_iterations_unbounded, >> how should we adjust the number for calling scale_loop_frequencies >> which has int type? In extreme case, gcov_type could be out of int's >> range, we have to cap the value anyway. But yes, 10000 in >> expect_loop_iterations is too small. > > What I usually do is to use fixed point math in this case (based on > REG_BR_PROB_BASE). > Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type > converting > to int. Because you should be just decreasing the counts, it won't overflow > and because > the decarese will be in range, say 2...256 times, it should also be > sufficiently > precise. > > Be careful to avoid overflow of gcov type - it is not safe to multiply two > counts in 64bit math because each count can be more than 2^32. (next stage1 I > plan to switch most of this to sreals that will make this easier) > >> >> > But I guess here it is sort of safe because vectorized loops are simple. >> >> > You can't just scale down the existing counts/frequencies by vf, >> >> > because the >> >> > entry edge frequency was adjusted. >> >> I am not 100% follow here, it looks the code avoids changing frequency >> >> counter for preheader/exit edge, otherwise we would need to change all >> >> counters dominated by them? >> > >> > I was just wondering what is wrong with taking the existing >> > frequencies/counts >> > the loop body has and dividing them all by the unroll factor. This is >> > correct >> > if you ignore the versioning. With versioning I guess you want to scale by >> > the unroll factor and also subtract frequencies/counts that was acocunted >> > to >> > the other versions of the loop, right? >> IIUC, for (vectorized) loop header, it's frequency is calculated by: >> freq_header = freq_preheader + freq_latch >> and freq_latch = (niter * freq_preheader). Simply scaling it by VF gives: >> freq_header = (freq_preheader + freq_latch) / VF >> which is wrong. Especially if the loop is vectorized by large VF >> (=16) and we normally assume niter (=10) without profiling >> information, it results not only mismatch, but also >> (loop->header->frequency < loop->preheader->frequency). In fact, if >> we have accurate niter information, the counter should be: >> freq_header = freq_preheader + niter * freq_preheader > > You are right. We need to compensate for the change of probability of the loop > exit edge. >> >> >> > >> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so >> >> > perhaps you >> >> > need to compute it before you start chanigng the CFG by peeling >> >> > proplogue? >> >> Peeling for prologue doesn't change profiling information of >> >> vect_loop, it is the skip edge from before loop to preferred epilogue >> >> loop that will change profile counters. I guess here exists a dilemma >> >> that niter_for_unrolled_loop is for loop after peeling for prologue? >> > >> > expected_loop_iterations_unbounded calculates number of iteations by >> > computing >> > sum of frequencies of edges entering the loop and comparing it to the >> > frequency >> > of loop header. While peeling the prologue, you split the preheader edge >> > and >> > adjust frequency of the new preheader BB of the loop to be vectorized. I >> > think >> > that will adjust the #of iterations estimate. >> It's not the case now I think. one motivation of new vect_do_peeling >> is to avoid niter checks between prologue and vector loop. Once >> prologue loop is entered or checked, the vector loop must be executed >> unconditionally. So the preheaderof vector loop has consistent >> frequency counters now. The niter check on whether vector loop should >> be executed is now merged with cost check before prologue, and in the >> future I think this can be further merged if loop versioning is >> needed. > > Originally you have > > loop_preheader > | > v > loop_header > > and the ratio of the two BB frequencies is the loop iteration count. Then you > do something like: > > orig_loop_preheader > | > v > loop_prologue -----> scalar_version_of_loop > | > v > new_loop_preheader > | > v > loop_header It's like:
orig_loop_preheader | v check_on_niter -----> scalar_version_of_loop (i.e, epilog_loop) | v loop_prologue | v new_loop_preheader | v loop_header Yes, the preheader/header need to be consistent here, and it is now. Thanks for explaining. > At some point, you need to update new_loop_preheader frequency/count > to reflect the fact that with some probability the loop_prologue avoids > the vectorized loop. Once you do it and if you don't scale frequency of > loop_header you will make expect_loop_iterations to return higher value > than previously. > > So at the time you are calling it, you need to be sure that the loop_header > and its preheader frequences was both adjusted by same factor. Or you need > to call it early before you start hacking on the CFG and its profile. > > Pehaps currently it is safe, because your peeling code is also scaling > the loop profiles. I will revise the patch according to this discussion. Thanks, bin > > Honza >> >> Thanks, >> bin >> > >> > Honza >> >> >> >> Thanks, >> >> bin >> >> > >> >> > Finally if freq_e is 0, all frequencies and counts will be probably >> >> > dropped to >> >> > 0. What about determining fraction by counts if they are available? >> >> > >> >> > Otherwise the patch looks good and thanks a lot for working on this! >> >> > >> >> > Honza >> >> > >> >> >> > >> >> >> > gcc/testsuite/ChangeLog >> >> >> > 2017-02-16 Bin Cheng <bin.ch...@arm.com> >> >> >> > >> >> >> > PR tree-optimization/77536 >> >> >> > * gcc.dg/vect/pr79347.c: Revise testing string.