------- Additional Comments From hubicka at ucw dot cz 2004-12-06 13:40 ------- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression
> > ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen > dot de 2004-12-06 13:18 ------- > Subject: Re: [4.0 Regression] Inlining limits > cause 340% performance regression > > On 6 Dec 2004, hubicka at ucw dot cz wrote: > > > The cfg inliner per se is not too interesting. What matters here is the > > code size esitmation and profitability estimation. I am playing with > > this now and trying to get profile based inlining working. > > Yes, I guess the cfg inliner and some early dead code removal passes > should improve code size metrics for stuff like > > template <class X> > struct Foo > { > enum { val = X::val }; > void foo() > { > if (val) > ... > else > ... > } > }; > > with val being const. > > > For -n10 and tramp3d.cc I need 2m14s on mainline, 1m31s on the current > > tree-profiling. With my new implementation I need 0m27s with profile > > feedback and 2m53s without. I wonder what makes the new heuristics work > > worse without profiling, but just increasing the inline-unit-growth very > > slightly (to 155) I get 0m42s. This might be just little unstability in > > Note that inline-unit-growth is 50 by default, so 155 is not slightly > increased. OK, I will play around with 55 then :) > > > the order of inlining decisions affecting this. I would be curious how > > those results compare to leafify and whether the 0m27s is not caused by > > missoptimization. > > You can check for misoptimization by looking at the final output. > I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum > will increase with the number of iterations. > > With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math > -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for > unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with > leafification turned on, with it turned off, runtime increases > to 0m31s with --param inline-unit-growth=175. I compiled with -O3, would be possible for you to measure how much speedup you get on mainline with -O3 and -O3+lefify? That would probably allow me relate those numbers somehow. > > > Unless I will observe it otherwise (on SPEC with intermodule), I will > > apply my current patch and try to improve the profitability analysis > > without profiling incrementally. Ideally we ought to build estimated > > profile and use it, but that needs some work so for the moment I guess I > > will try to experiment with making loop depth available to the cgraph > > code. > > Yes, loops could be "auto-leafified", but it will be difficult to > statically check if that is worthwhile. I guess just increasing priority for calls inside loops (something like dividing current cost estimation by loop nest) would do good job for now, but first I need to convince myself that the new rewrite does resonable job even for current cost metric before moving on. Honza > > Richard. > > -- > Richard Guenther <richard dot guenther at uni-tuebingen dot de> > WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ > > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704