[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

hubicka at ucw dot cz Mon, 06 Dec 2004 07:03:50 -0800

------- Additional Comments From hubicka at ucw dot cz  2004-12-06 15:03 -------
Subject: Re:  [4.0 Regression] Inlining limits cause 340% performance regression

> 
> ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen 
> dot de  2004-12-06 14:31 -------
> Subject: Re:  [4.0 Regression] Inlining limits
>  cause 340% performance regression
> 
> On 6 Dec 2004, hubicka at ucw dot cz wrote:
> 
> > > > the order of inlining decisions affecting this.  I would be curious how
> > > > those results compare to leafify and whether the 0m27s is not caused by
> > > > missoptimization.
> > >
> > > You can check for misoptimization by looking at the final output.
> > > I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum
> > > will increase with the number of iterations.
> > >
> > > With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math
> > > -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for
> > > unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with
> > > leafification turned on, with it turned off, runtime increases
> > > to 0m31s with --param inline-unit-growth=175.
> >
> > I compiled with -O3, would be possible for you to measure how much
> > speedup you get on mainline with -O3 and -O3+lefify?  That would
> > probably allow me relate those numbers somehow.
> 
> 0m23s for -O3+leafify, 1m54s for -O3, 0m35s for -O3 --param
> inline-unit-growth=150.

Looks like I get 4fold speedup on tree profiling with profiling compared
to tree profiling on mainline that is equivalent to speedup you are
seeing for leafify patch. That sounds pretty prommising (so the new
heuristics can get the leafify idea without the hint from user and
hitting the code growth problems).

It would be nice to experiment with this a little - in general the
heuristics can be viewed as having three players.  There are the limits
(specified via --param) that it must obey, there is the cost model
(estimated growth for inlining into all callees without profiling and
the execute_count to estimated growth for inlining to one call with
profiling) and the bin packing algorithm optimizing the gains while
obeying the limits.

With profiling in the cost model is pretty much realistic and it would
be nice to figure out how the performance behave when the individual
limits are changed and why.  If you have some time for experimentation,
it would be very usefull.  I am trying to do the same with SPEC and GCC
but I have dificulty to play with pooma or Gerald's application as I
have little understanding what is going there.  I will try it myself
next but any feedback can be very usefull here.

My plan is to try undersand the limits first and then try to get the
cost model better without profiling as it is bit too clumpsy to do both
at once.

Honza
> 
> Richard.
> 
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704

[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression

Reply via email to