On Thu, 9 Jun 2016, Jan Hubicka wrote:

> > On Thu, 9 Jun 2016, Jan Hubicka wrote:
> > 
> > > Hi,
> > > after we read the profile, we know expected number of iterations.
> > 
> > We know the average ;)  It may make sense to add some histogram
> > value profiling for niter now that we should easily able to do so.
> 
> I always interpreted the estimated number of iterations to be the same as
> expected number of iterations and to be same as average.  So it seems to be
> sane to feed the info from profile.
> 
> I am thinking to add the histograms, yes.  It is midly anoying to do so 
> becuase
> one needs to intrstrument all exit edges out of loop. I guess we don't care
> much if histogram will get lost on abnormals.

Can't we just instrument the loop header (and thus get "averages" for all
exits)?

> One option I am thinking about is to introduce counter that will take two
> parameters A,B.  It will record linear histogram for values in range 0...A and
> logarithmic histogram for values greater than A capping by 2^B (so we don't
> need 64 counters for every loop).  I think we are not really that interested 
> in
> precise histograms for loops that iterate more than, say, 2^10 times. We only
> need to know that it loops a lot.  We however care about low iteration counts
> to make good decision for peeling.  This is still 26 counters per loop that is
> quite a lot.  

I think what we are interested in is "clusters" of niters.  So yes,
having a histogram for low values (can we use HIST_TYPE_POW2 here?)
makes sense.  Or simply sum all n <= 16 and all n > 16 values separately?
We seem to lack a histogram type where we can specify N differently-sized
bins.

> A lot cheaper alternative may be to simply measure loop peeling limit by 
> having counter that counts how often loop exits in first 
> PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is 
> the maximal number of iterations in this case (we really want likely 
> maximal number of iterations but that seems harder to get). This will 
> determine peeling limit which we can then store into loop structure.

Sure.

In my original loop value-profiling work the most interesting (for
performance) case was value-profiling variable strides and then
versioning for stride == 1.  That triggered a lot with fortran
array descriptors though nowadays IPA-CP of aggregates plus LTO
might have solved this.

> Vectorizer/unroller/prefetcher and most of other classical loop opts 
> care about the average being large enough so the current expected value 
> seems to do the trick.

Yes, though knowing that a significant fraction of times the iteration
count is one would also be useful.

> This does not let you to update profile very precisely after peeling (and 
> also peeling
> done by vectorizer), but it needs only 2 counters per loop.
> 
> What other passes, beside peeling, would immediately benefit from a 
> historgram? I wonder if you can think of better scheme?

Epilogue peeling in the vectorizer (for the remainder of iterations,
thus profiling N % vectorization-factor which means a value-profile
of niter & <low-bits>).

Peeling for alignment profitability (we do too much of that IMHO),
thus value-profiling of <read1-addr> | <read2-addr> | ...
and <write1-addr> | <write2-addr> | ... (not sure if that would
be fine-grained enough - we want to know if peeling for alignment
will likely make many accesses aligned, not just the one we peel for).

> > 
> > > Currently we use profile each time estimate_numbers_of_iterations_loop
> > > is called to recompute this value.  This is not very safe because the
> > > profile may be misupdated.  It seems safer to compute it at once and
> > > maintain thorough the compilation.
> > > 
> > > Notice that I removed:
> > > -  /* Force estimate compuation but leave any existing upper bound in 
> > > place.  */
> > > -  loop->any_estimate = false;
> > > From beggining of estimate_numbers_of_iterations_loop.  I can not make 
> > > sense of
> > > this. Even w/o profile if we have estimate, we are better to maintain it 
> > > because
> > > later we may not be able to derive it again.
> > > There seems to be no code that is forced by setting loop->any_estimate = 
> > > true.
> > > Only code that cares seems to be record_niter_bound that only decreases 
> > > existing
> > > estimates. THis seems sane procedure - we don't roll loops.
> > > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > 
> > Ok.  Did you check what this does to SPEC with FDO?
> 
> I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have
> readily available.  Any regressions would however point to loop info updating
> bugs (or wrong use of the profile) so I will look into them if they appear.
> I am trying to benchmark firefox now.

Ok.

Thanks,
Richard.

Reply via email to