https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98775
Bug ID: 98775 Summary: missing optimization opportunity on nbody Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: vanyacpp at gmail dot com Target Milestone: --- Created attachment 50015 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50015&action=edit nbody.cpp On the attached sample (208 LOC), clang 11.0 generates the code that is almost twice as fast as the one generated by GCC 10.2 (-O3 -ffast-math -flto). $ ./nbody 50000000 4.0s for clang vs 7.5s for GCC. A quick look at the generated code shows that clang aggressively unrolled all inner loops. If I unroll all inner loops manually I get: $ ./nbody-unrolled 50000000 3.7s for clang vs 6.3s for GCC. 17.6B instructions for clang vs 29.6B instructions for GCC. While the first sample is a subject to unrolling heuristic, the second is about optimizing the completely linear chunk of code with many floating point multiplications and additions. I tried reducing the sample further, but I only came up with PR98774.