On Fri, 10 Jun 2016, Jan Hubicka wrote:

> > > I always interpreted the estimated number of iterations to be the same as
> > > expected number of iterations and to be same as average.  So it seems to 
> > > be
> > > sane to feed the info from profile.
> > > 
> > > I am thinking to add the histograms, yes.  It is midly anoying to do so 
> > > becuase
> > > one needs to intrstrument all exit edges out of loop. I guess we don't 
> > > care
> > > much if histogram will get lost on abnormals.
> > 
> > Can't we just instrument the loop header (and thus get "averages" for all
> > exits)?
> 
> You need to know number of iterations, so either you save it into a global and
> flush into profile next time edge header->preheader is executed or you 
> instrument
> exists. At least when one wants something more precise than average iteration 
> count
> I think.
> > 
> > > One option I am thinking about is to introduce counter that will take two
> > > parameters A,B.  It will record linear histogram for values in range 
> > > 0...A and
> > > logarithmic histogram for values greater than A capping by 2^B (so we 
> > > don't
> > > need 64 counters for every loop).  I think we are not really that 
> > > interested in
> > > precise histograms for loops that iterate more than, say, 2^10 times. We 
> > > only
> > > need to know that it loops a lot.  We however care about low iteration 
> > > counts
> > > to make good decision for peeling.  This is still 26 counters per loop 
> > > that is
> > > quite a lot.  
> > 
> > I think what we are interested in is "clusters" of niters.  So yes,
> > having a histogram for low values (can we use HIST_TYPE_POW2 here?)
> > makes sense.  Or simply sum all n <= 16 and all n > 16 values separately?
> > We seem to lack a histogram type where we can specify N differently-sized
> > bins.
> 
> HIST_TYPE_POW2 measure how often value is exactly power of two and we don't
> have histogram counter at all, so we need to introduce one ;)
> 
> We have HIST_TYPE_INTERVAL that is real linear histogram for given interval 
> of values.
> > 
> > > A lot cheaper alternative may be to simply measure loop peeling limit by 
> > > having counter that counts how often loop exits in first 
> > > PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is 
> > > the maximal number of iterations in this case (we really want likely 
> > > maximal number of iterations but that seems harder to get). This will 
> > > determine peeling limit which we can then store into loop structure.
> > 
> > Sure.
> > 
> > In my original loop value-profiling work the most interesting (for
> > performance) case was value-profiling variable strides and then
> > versioning for stride == 1.  That triggered a lot with fortran
> > array descriptors though nowadays IPA-CP of aggregates plus LTO
> > might have solved this.
> 
> ipa-cp is not very powerful on propagating those. Measuring strides is
> interesting idea. I forgot about your work on the loop histograms. Do you
> still have patches around?

No, but those weren't complete anyway as we didn't preserve loops
back in time.  I think I only implemented versioning assuming the
stride would be one unconditionally to show the possible benefit
(it was for the 2006 Summit).

Richard.

> > 
> > > Vectorizer/unroller/prefetcher and most of other classical loop opts 
> > > care about the average being large enough so the current expected value 
> > > seems to do the trick.
> > 
> > Yes, though knowing that a significant fraction of times the iteration
> > count is one would also be useful.
> 
> Yes, if we interval profile the low iteration counts, we will have exact
> info on 0,1...n intrations.
> > 
> > > This does not let you to update profile very precisely after peeling (and 
> > > also peeling
> > > done by vectorizer), but it needs only 2 counters per loop.
> > > 
> > > What other passes, beside peeling, would immediately benefit from a 
> > > historgram? I wonder if you can think of better scheme?
> > 
> > Epilogue peeling in the vectorizer (for the remainder of iterations,
> > thus profiling N % vectorization-factor which means a value-profile
> > of niter & <low-bits>).
> 
> Hmm, to predict that we will need to know the exact value modulo vectorization
> factor. This seems bit hard to measure an I am not sure how improtant it is.
> > 
> > Peeling for alignment profitability (we do too much of that IMHO),
> > thus value-profiling of <read1-addr> | <read2-addr> | ...
> > and <write1-addr> | <write2-addr> | ... (not sure if that would
> > be fine-grained enough - we want to know if peeling for alignment
> > will likely make many accesses aligned, not just the one we peel for).
> 
> Peeling for alingmnet can probably be predicted by the power of 2 predictor - 
> that
> is already used by string routines. For that we need to identify which 
> alignments
> we want to test at the vrp time and probably we can avoid alignmnet peeling 
> in favour
> of versioning for loops that work on aligned data in practice.
> 
> Honza
> > 
> > > > 
> > > > > Currently we use profile each time estimate_numbers_of_iterations_loop
> > > > > is called to recompute this value.  This is not very safe because the
> > > > > profile may be misupdated.  It seems safer to compute it at once and
> > > > > maintain thorough the compilation.
> > > > > 
> > > > > Notice that I removed:
> > > > > -  /* Force estimate compuation but leave any existing upper bound in 
> > > > > place.  */
> > > > > -  loop->any_estimate = false;
> > > > > From beggining of estimate_numbers_of_iterations_loop.  I can not 
> > > > > make sense of
> > > > > this. Even w/o profile if we have estimate, we are better to maintain 
> > > > > it because
> > > > > later we may not be able to derive it again.
> > > > > There seems to be no code that is forced by setting 
> > > > > loop->any_estimate = true.
> > > > > Only code that cares seems to be record_niter_bound that only 
> > > > > decreases existing
> > > > > estimates. THis seems sane procedure - we don't roll loops.
> > > > > 
> > > > > Bootstrapped/regtested x86_64-linux, OK?
> > > > 
> > > > Ok.  Did you check what this does to SPEC with FDO?
> > > 
> > > I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I 
> > > have
> > > readily available.  Any regressions would however point to loop info 
> > > updating
> > > bugs (or wrong use of the profile) so I will look into them if they 
> > > appear.
> > > I am trying to benchmark firefox now.
> > 
> > Ok.
> > 
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to