On 13/10/2011, at 12:58 AM, Richard Guenther wrote: > On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov <ma...@codesourcery.com> > wrote: >> The following patch adds new knob to make GCC perform several iterations of >> early optimizations and inlining. >> >> This is for dont-care-about-compile-time-optimize-all-you-can scenarios. >> Performing several iterations of optimizations does significantly improve >> code speed on a certain proprietary source base. Some hand-tuning of the >> parameter value is required to get optimum performance. Another good use >> for this option is for search and ad-hoc analysis of cases where GCC misses >> optimization opportunities. >> >> With the default setting of '1', nothing is changed from the current status >> quo. >> >> The patch was bootstrapped and regtested with 3 iterations set by default on >> i686-linux-gnu. The only failures in regression testsuite were due to >> latent bugs in handling of EH information, which are being discussed in a >> different thread. >> >> Performance impact on the standard benchmarks is not conclusive, there are >> improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*]. >> SPEC2006 benchmarks will take another day or two to complete and I will >> update the spreadsheet then. The benchmarks were run on a Core2 system for >> all combinations of {-m32/-m64}{-O2/-O3}. >> >> Effect on compilation time is fairly predictable, about 10% compile time >> increase with 3 iterations. >> >> OK for trunk? > > I don't think this is a good idea, especially in the form you implemented it. > > If we'd want to iterate early optimizations we'd want to do it by iterating > an IPA pass so that we benefit from more precise size estimates > when trying to inline a function the second time.
Could you elaborate on this a bit? Early optimizations are gimple passes, so I'm missing your point here. > Also statically > scheduling the passes will mess up dump files and you have no > chance of say, noticing that nothing changed for function f and its > callees in iteration N and thus you can skip processing them in > iteration N + 1. Yes, these are the shortcomings. The dump files name changes can be fixed, e.g., by adding a suffix to the passes on iterations after the first one. The analysis to avoid unnecessary iterations is more complex problem. > > So, at least you should split the pass_early_local_passes IPA pass > into three, you'd iterate over the 2nd (definitely not over > pass_split_functions > though), the third would be pass_profile and pass_split_functions only. > And you'd iterate from the place the 2nd IPA pass is executed, not > by scheduling them N times. OK, I will look into this. > > Then you'd have to analyze the compile-time impact of the IPA > splitting on its own when not iterating. Then you should look > at what actually was the optimizations that were performed > that lead to the improvement (I can see some indirect inlining > happening, but everything else would be a bug in present > optimizers in the early pipeline - they are all designed to be > roughly independent on each other and _not_ expose new > opportunities by iteration). Thus - testcases? The initial motivation for the patch was to enable more indirect inlining and devirtualization opportunities. Since then I found the patch to be helpful in searching for optimization opportunities and bugs. E.g., SPEC2006's 471.omnetpp drops 20% with 2 additional iterations of early optimizations [*]. Given that applying more optimizations should, theoretically, not decrease performance, there is likely a very real bug or deficiency behind that. Thank you, [*] https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFE&hl=en_US -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics