http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57249
Bug ID: 57249 Summary: Unrolling too late for inlining Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Hello, this code is a variant of the code at http://stackoverflow.com/questions/16493290/why-is-inlined-function-slower-than-function-pointer typedef void (*Fn)(); long sum = 0; inline void accu() { sum+=4; } static const Fn map[4] = {&accu, &accu, &accu, &accu}; void f(bool opt) { const long N = 10000000L; if (opt) { for (long i = 0; i < N; i++) { accu(); accu(); accu(); accu(); } } else { for (long i = 0; i < N; i++) { for (int j = 0; j < 4; j++) (*map[j])(); } } } In the first loop, g++ -O3 inlines the 4 accu() calls in the einline pass. Later passes optimize the whole loop to a single +=. In the second loop, we need to wait until the inner loop is unrolled to see the accu() calls, and there is no inlining pass after that (and then it would still need the right passes to optimize the outer loop to sum+=160000000). I am not sure what the right solution is, since too aggressive early unrolling can be bad for other optimizations. Note that LLVM manages to optimize the whole function to a single +=.