https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91986
Bug ID: 91986 Summary: missed loop unrolling optimization Product: gcc Version: 9.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: trass3r at gmail dot com Target Milestone: --- #include <cstdint> #include <utility> const int N = 4; void bitreverse(int* data) { uint32_t j = 0; for (uint32_t i = 0; i < N; ++i) { if (j > i) std::swap(data[i], data[j]); uint32_t k = N/2; while (k <= j) { j -= k; k /= 2; } j += k; } } Even with -O3 gcc doesn't fully unroll the loop but still there are quite some redundant instructions: https://godbolt.org/z/cA-S8J It's similar with N=8: https://godbolt.org/z/niZPeS Interestingly if you change the swap line slightly clang suddenly starts using pshufd which seems quite clever: https://godbolt.org/z/5YJnJ1