Compared to 3.4, the default inlining limits in 4.0 cause a 340% performance regression on the tramp3d-v3.cpp testcase here: http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d-v3.cpp.gz
The regression can be attributed to the inlining limits, as patching both compilers with the leafify patch results in same performance. Compilation options used are -Dleafify=fooblah -O2 -fpeel-loops -ffast-math -march=pentium4 -mfpmath=sse -fno-exceptions. Binary size is "improved" by about 9% with the current defaults. Using --param max-inline-insns-single=1000 worsens the situation to a Playing with the inlining params gives max-inline-insns-single large-function-growth inline-unit-growth regression 340% 1000 375% 500 348% 200 -36% (1% size regression) 175 -35% (4% size improvement) 165 -12% 150 -12% (!?) 100 232% So I guess, limiting overall unit growth is bad - can we disable limiting at -Os, or provide a higher default value? The "correct" value will be different depending on the application. Also, the documented default value for inline-unit-growth is not what it actually seems to be (it is 50 reading params.def, large-function-growth is also not correctly documented). If we make the documented values the default, we get a 68% compile time and a 3.7% code size regression for a 71% performance improvement (this was including "correcting" the large-function-growth limit, which seems to hurt rather than help). -- Summary: Inlining limits cause 340% performance regression Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at tat dot physik dot uni-tuebingen dot de CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704