On 13/10/2011, at 12:58 AM, Richard Guenther wrote:

> On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov <ma...@codesourcery.com> 
> wrote:
>> The following patch adds new knob to make GCC perform several iterations of 
>> early optimizations and inlining.
>> 
>> This is for dont-care-about-compile-time-optimize-all-you-can scenarios.  
>> Performing several iterations of optimizations does significantly improve 
>> code speed on a certain proprietary source base.  Some hand-tuning of the 
>> parameter value is required to get optimum performance.  Another good use 
>> for this option is for search and ad-hoc analysis of cases where GCC misses 
>> optimization opportunities.
>> 
>> With the default setting of '1', nothing is changed from the current status 
>> quo.
>> 
>> The patch was bootstrapped and regtested with 3 iterations set by default on 
>> i686-linux-gnu.  The only failures in regression testsuite were due to 
>> latent bugs in handling of EH information, which are being discussed in a 
>> different thread.
>> 
>> Performance impact on the standard benchmarks is not conclusive, there are 
>> improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*].  
>> SPEC2006 benchmarks will take another day or two to complete and I will 
>> update the spreadsheet then.  The benchmarks were run on a Core2 system for 
>> all combinations of {-m32/-m64}{-O2/-O3}.
>> 
>> Effect on compilation time is fairly predictable, about 10% compile time 
>> increase with 3 iterations.
>> 
>> OK for trunk?
> 
> I don't think this is a good idea, especially in the form you implemented it.
> 
> If we'd want to iterate early optimizations we'd want to do it by iterating
> an IPA pass so that we benefit from more precise size estimates
> when trying to inline a function the second time.  

Could you elaborate on this a bit?  Early optimizations are gimple passes, so 
I'm missing your point here.

> Also statically
> scheduling the passes will mess up dump files and you have no
> chance of say, noticing that nothing changed for function f and its
> callees in iteration N and thus you can skip processing them in
> iteration N + 1.

Yes, these are the shortcomings.  The dump files name changes can be fixed, 
e.g., by adding a suffix to the passes on iterations after the first one.  The 
analysis to avoid unnecessary iterations is more complex problem.

> 
> So, at least you should split the pass_early_local_passes IPA pass
> into three, you'd iterate over the 2nd (definitely not over 
> pass_split_functions
> though), the third would be pass_profile and pass_split_functions only.
> And you'd iterate from the place the 2nd IPA pass is executed, not
> by scheduling them N times.

OK, I will look into this.

> 
> Then you'd have to analyze the compile-time impact of the IPA
> splitting on its own when not iterating.  Then you should look
> at what actually was the optimizations that were performed
> that lead to the improvement (I can see some indirect inlining
> happening, but everything else would be a bug in present
> optimizers in the early pipeline - they are all designed to be
> roughly independent on each other and _not_ expose new
> opportunities by iteration).  Thus - testcases?

The initial motivation for the patch was to enable more indirect inlining and 
devirtualization opportunities. Since then I found the patch to be helpful in 
searching for optimization opportunities and bugs.  E.g., SPEC2006's 
471.omnetpp drops 20% with 2 additional iterations of early optimizations [*].  
Given that applying more optimizations should, theoretically, not decrease 
performance, there is likely a very real bug or deficiency behind that.

Thank you,

[*] 
https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFE&hl=en_US

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



Reply via email to