On Mon, Feb 20, 2017 at 2:02 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >> > 2017-02-16 Bin Cheng <bin.ch...@arm.com> >> > >> > PR tree-optimization/77536 >> > * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function. >> > (tree_transform_and_unroll_loop): Use above function to compute the >> > estimated niter of unrolled loop. >> > * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration. >> > * tree-vect-loop.c (scale_profile_for_vect_loop): New function. >> > (vect_transform_loop): Call above function. Thanks very much for your suggestions. I don't know profiling logic very well and have some questions embedded below before I start revising patch. > > +/* Return estimated niter for LOOP after unrolling by FACTOR times. */ > + > +unsigned > +niter_for_unrolled_loop (struct loop *loop, unsigned factor) > +{ > + unsigned est_niter = expected_loop_iterations (loop); > > What happens when you have profile and loop iterates very many times? > Perhaps we want to do all calculation in gcov_type and use > expected_loop_iterations_unbounded>? > > expected_loop_iterations is capping by 10000 that is easy to overflow. > > + gcc_assert (factor != 0); > + unsigned new_est_niter = est_niter / factor; > + > + /* Without profile feedback, loops for which we do not know a better > estimate > + are assumed to roll 10 times. When we unroll such loop, it appears to > + roll too little, and it may even seem to be cold. To avoid this, we > + ensure that the created loop appears to roll at least 5 times (but at > + most as many times as before unrolling). */ > + if (new_est_niter < 5) > + { > + if (est_niter < 5) > + new_est_niter = est_niter; > + else > + new_est_niter = 5; > + } > + > + return new_est_niter; > +} > > I see this code is pre-existing, but please extend it to test if > loop->header->count is non-zero. Even if we do not have idea about loop > iteration count estimate we may end up predicting more than 10 iterations when > predictors combine that way. If I use expected_loop_iterations_unbounded, then do I need to handle loop->header->count explicitly here? I suppose not because it has below code already:
/* If we have no profile at all, use AVG_LOOP_NITER. */ if (profile_status_for_fn (cfun) == PROFILE_ABSENT) expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER); else if (loop->latch && (loop->latch->count || loop->header->count)) { gcov_type count_in, count_latch; //... The second question is: looks like it only takes latch->count into consideration when PROFILE_ABSENT. But according to your comments, we could have nonzero count sometime? > > Perhaps testing estimated-loop_iterations would also make sense, but that > could be dealt with incrementally. > > +static void > +scale_profile_for_vect_loop (struct loop *loop, unsigned vf) > +{ > + unsigned freq_h = loop->header->frequency; > + unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop)); > + /* Reduce loop iterations by the vectorization factor. */ > + unsigned new_est_niter = niter_for_unrolled_loop (loop, vf); > + > + if (freq_h != 0) > + scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h); > + > I am always trying to avoid propagating small mistakes (i.e. frong freq_h or > freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid > spreading mistakes across cfg. > > But I guess here it is sort of safe because vectorized loops are simple. > You can't just scale down the existing counts/frequencies by vf, because the > entry edge frequency was adjusted. I am not 100% follow here, it looks the code avoids changing frequency counter for preheader/exit edge, otherwise we would need to change all counters dominated by them? > > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you > need to compute it before you start chanigng the CFG by peeling proplogue? Peeling for prologue doesn't change profiling information of vect_loop, it is the skip edge from before loop to preferred epilogue loop that will change profile counters. I guess here exists a dilemma that niter_for_unrolled_loop is for loop after peeling for prologue? Thanks, bin > > Finally if freq_e is 0, all frequencies and counts will be probably dropped to > 0. What about determining fraction by counts if they are available? > > Otherwise the patch looks good and thanks a lot for working on this! > > Honza > >> > >> > gcc/testsuite/ChangeLog >> > 2017-02-16 Bin Cheng <bin.ch...@arm.com> >> > >> > PR tree-optimization/77536 >> > * gcc.dg/vect/pr79347.c: Revise testing string.