On Thu, 9 Jun 2016, Jan Hubicka wrote: > > On Thu, 9 Jun 2016, Jan Hubicka wrote: > > > > > Hi, > > > after we read the profile, we know expected number of iterations. > > > > We know the average ;) It may make sense to add some histogram > > value profiling for niter now that we should easily able to do so. > > I always interpreted the estimated number of iterations to be the same as > expected number of iterations and to be same as average. So it seems to be > sane to feed the info from profile. > > I am thinking to add the histograms, yes. It is midly anoying to do so > becuase > one needs to intrstrument all exit edges out of loop. I guess we don't care > much if histogram will get lost on abnormals.
Can't we just instrument the loop header (and thus get "averages" for all exits)? > One option I am thinking about is to introduce counter that will take two > parameters A,B. It will record linear histogram for values in range 0...A and > logarithmic histogram for values greater than A capping by 2^B (so we don't > need 64 counters for every loop). I think we are not really that interested > in > precise histograms for loops that iterate more than, say, 2^10 times. We only > need to know that it loops a lot. We however care about low iteration counts > to make good decision for peeling. This is still 26 counters per loop that is > quite a lot. I think what we are interested in is "clusters" of niters. So yes, having a histogram for low values (can we use HIST_TYPE_POW2 here?) makes sense. Or simply sum all n <= 16 and all n > 16 values separately? We seem to lack a histogram type where we can specify N differently-sized bins. > A lot cheaper alternative may be to simply measure loop peeling limit by > having counter that counts how often loop exits in first > PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is > the maximal number of iterations in this case (we really want likely > maximal number of iterations but that seems harder to get). This will > determine peeling limit which we can then store into loop structure. Sure. In my original loop value-profiling work the most interesting (for performance) case was value-profiling variable strides and then versioning for stride == 1. That triggered a lot with fortran array descriptors though nowadays IPA-CP of aggregates plus LTO might have solved this. > Vectorizer/unroller/prefetcher and most of other classical loop opts > care about the average being large enough so the current expected value > seems to do the trick. Yes, though knowing that a significant fraction of times the iteration count is one would also be useful. > This does not let you to update profile very precisely after peeling (and > also peeling > done by vectorizer), but it needs only 2 counters per loop. > > What other passes, beside peeling, would immediately benefit from a > historgram? I wonder if you can think of better scheme? Epilogue peeling in the vectorizer (for the remainder of iterations, thus profiling N % vectorization-factor which means a value-profile of niter & <low-bits>). Peeling for alignment profitability (we do too much of that IMHO), thus value-profiling of <read1-addr> | <read2-addr> | ... and <write1-addr> | <write2-addr> | ... (not sure if that would be fine-grained enough - we want to know if peeling for alignment will likely make many accesses aligned, not just the one we peel for). > > > > > Currently we use profile each time estimate_numbers_of_iterations_loop > > > is called to recompute this value. This is not very safe because the > > > profile may be misupdated. It seems safer to compute it at once and > > > maintain thorough the compilation. > > > > > > Notice that I removed: > > > - /* Force estimate compuation but leave any existing upper bound in > > > place. */ > > > - loop->any_estimate = false; > > > From beggining of estimate_numbers_of_iterations_loop. I can not make > > > sense of > > > this. Even w/o profile if we have estimate, we are better to maintain it > > > because > > > later we may not be able to derive it again. > > > There seems to be no code that is forced by setting loop->any_estimate = > > > true. > > > Only code that cares seems to be record_niter_bound that only decreases > > > existing > > > estimates. THis seems sane procedure - we don't roll loops. > > > > > > Bootstrapped/regtested x86_64-linux, OK? > > > > Ok. Did you check what this does to SPEC with FDO? > > I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have > readily available. Any regressions would however point to loop info updating > bugs (or wrong use of the profile) so I will look into them if they appear. > I am trying to benchmark firefox now. Ok. Thanks, Richard.