On Fri, 10 Jun 2016, Jan Hubicka wrote: > > > I always interpreted the estimated number of iterations to be the same as > > > expected number of iterations and to be same as average. So it seems to > > > be > > > sane to feed the info from profile. > > > > > > I am thinking to add the histograms, yes. It is midly anoying to do so > > > becuase > > > one needs to intrstrument all exit edges out of loop. I guess we don't > > > care > > > much if histogram will get lost on abnormals. > > > > Can't we just instrument the loop header (and thus get "averages" for all > > exits)? > > You need to know number of iterations, so either you save it into a global and > flush into profile next time edge header->preheader is executed or you > instrument > exists. At least when one wants something more precise than average iteration > count > I think. > > > > > One option I am thinking about is to introduce counter that will take two > > > parameters A,B. It will record linear histogram for values in range > > > 0...A and > > > logarithmic histogram for values greater than A capping by 2^B (so we > > > don't > > > need 64 counters for every loop). I think we are not really that > > > interested in > > > precise histograms for loops that iterate more than, say, 2^10 times. We > > > only > > > need to know that it loops a lot. We however care about low iteration > > > counts > > > to make good decision for peeling. This is still 26 counters per loop > > > that is > > > quite a lot. > > > > I think what we are interested in is "clusters" of niters. So yes, > > having a histogram for low values (can we use HIST_TYPE_POW2 here?) > > makes sense. Or simply sum all n <= 16 and all n > 16 values separately? > > We seem to lack a histogram type where we can specify N differently-sized > > bins. > > HIST_TYPE_POW2 measure how often value is exactly power of two and we don't > have histogram counter at all, so we need to introduce one ;) > > We have HIST_TYPE_INTERVAL that is real linear histogram for given interval > of values. > > > > > A lot cheaper alternative may be to simply measure loop peeling limit by > > > having counter that counts how often loop exits in first > > > PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is > > > the maximal number of iterations in this case (we really want likely > > > maximal number of iterations but that seems harder to get). This will > > > determine peeling limit which we can then store into loop structure. > > > > Sure. > > > > In my original loop value-profiling work the most interesting (for > > performance) case was value-profiling variable strides and then > > versioning for stride == 1. That triggered a lot with fortran > > array descriptors though nowadays IPA-CP of aggregates plus LTO > > might have solved this. > > ipa-cp is not very powerful on propagating those. Measuring strides is > interesting idea. I forgot about your work on the loop histograms. Do you > still have patches around?
No, but those weren't complete anyway as we didn't preserve loops back in time. I think I only implemented versioning assuming the stride would be one unconditionally to show the possible benefit (it was for the 2006 Summit). Richard. > > > > > Vectorizer/unroller/prefetcher and most of other classical loop opts > > > care about the average being large enough so the current expected value > > > seems to do the trick. > > > > Yes, though knowing that a significant fraction of times the iteration > > count is one would also be useful. > > Yes, if we interval profile the low iteration counts, we will have exact > info on 0,1...n intrations. > > > > > This does not let you to update profile very precisely after peeling (and > > > also peeling > > > done by vectorizer), but it needs only 2 counters per loop. > > > > > > What other passes, beside peeling, would immediately benefit from a > > > historgram? I wonder if you can think of better scheme? > > > > Epilogue peeling in the vectorizer (for the remainder of iterations, > > thus profiling N % vectorization-factor which means a value-profile > > of niter & <low-bits>). > > Hmm, to predict that we will need to know the exact value modulo vectorization > factor. This seems bit hard to measure an I am not sure how improtant it is. > > > > Peeling for alignment profitability (we do too much of that IMHO), > > thus value-profiling of <read1-addr> | <read2-addr> | ... > > and <write1-addr> | <write2-addr> | ... (not sure if that would > > be fine-grained enough - we want to know if peeling for alignment > > will likely make many accesses aligned, not just the one we peel for). > > Peeling for alingmnet can probably be predicted by the power of 2 predictor - > that > is already used by string routines. For that we need to identify which > alignments > we want to test at the vrp time and probably we can avoid alignmnet peeling > in favour > of versioning for loops that work on aligned data in practice. > > Honza > > > > > > > > > > > Currently we use profile each time estimate_numbers_of_iterations_loop > > > > > is called to recompute this value. This is not very safe because the > > > > > profile may be misupdated. It seems safer to compute it at once and > > > > > maintain thorough the compilation. > > > > > > > > > > Notice that I removed: > > > > > - /* Force estimate compuation but leave any existing upper bound in > > > > > place. */ > > > > > - loop->any_estimate = false; > > > > > From beggining of estimate_numbers_of_iterations_loop. I can not > > > > > make sense of > > > > > this. Even w/o profile if we have estimate, we are better to maintain > > > > > it because > > > > > later we may not be able to derive it again. > > > > > There seems to be no code that is forced by setting > > > > > loop->any_estimate = true. > > > > > Only code that cares seems to be record_niter_bound that only > > > > > decreases existing > > > > > estimates. THis seems sane procedure - we don't roll loops. > > > > > > > > > > Bootstrapped/regtested x86_64-linux, OK? > > > > > > > > Ok. Did you check what this does to SPEC with FDO? > > > > > > I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I > > > have > > > readily available. Any regressions would however point to loop info > > > updating > > > bugs (or wrong use of the profile) so I will look into them if they > > > appear. > > > I am trying to benchmark firefox now. > > > > Ok. > > > > Thanks, > > Richard. > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)