https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104515
Bug ID: 104515 Summary: trivially-destructible destructors interfere with loop optimization - maybe related to lifetime-dse. Product: gcc Version: og11 (devel/omp/gcc-11) Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gcc at rabensky dot com Target Milestone: --- This issue started in GCC-9.1, but a change in GCC-11 made it worse. It didn't exist in GCC-7.1-GCC-8.5 Short description: ----------------- When we have a loop that can be optimized out, calling the destructor for a trivially-destructible type will prevent the optimization starting from GCC-9.1 These are loops that correctly optimized out in GCC-7.1 to GCC-8.5 This bug doesn't happen if we set -fno-lifetime-dse Interestingly enough - a non-trivially-destructible destructor doesn't necessarily prevent the optimization. How this became worse in GCC-11: ------------------------------- In GCC-11 this also applies to calling the destructor of basic types (int, long etc.) So loops that optimized in GCC-7.1 to GCC-10.3 no longer optimize. Short reproducing example: ------------------------- NOTE: No `include`s are needed ``` using T = int; struct Vec { T* end; }; void pop_back_many(Vec& v, unsigned n) { for (unsigned i = 0; i < n; ++i) { --v.end; v.end->~T(); } } ``` compiled with `-O3 -Wall` In GCC-7 to GCC-10, `pop_back_many` optimizes out the loop (becomes `v.end-=n`). In GCC-11, the loop remains. See https://godbolt.org/z/vTexxhxP9 NOTE that adding `-fno-lifetime-dse` will re-enable the loop optimization. Why this matters ---------------- This prevents optimization of a loop over `std::vector<int>::pop_back()`, which is a very common usecase! Loops that optimize out in GCC-7.1 to GCC-10.3 will suddenly not optimize in GCC-11.1/2, making existing code run MUCH slower! (O(n) instead of O(1)) NOTE: std::vector<int>::resize is a lot slower than loop over pop_back. A loop over pop_back is currently the most efficient way to do pop_back_many! More complete reproducing example: --------------------------------- - We can replace the type `T` with a class that is trivially destructible. **In that case, the problem exists in previous versions of GCC as well** - We can replace the type `T` with a class that had user-supplied destructor. **In that case, the loop correctly optimizes out if possible** Actual examples: https://godbolt.org/z/7WqTPq3cE compiled with `-O3 -Wall` ``` template <typename T> struct Vec { T* end; }; template <typename T> void pop_back_many(Vec<T>& v, unsigned n) { for (unsigned i = 0; i < n; ++i) { --v.end; v.end->~T(); } } struct TrivialDestruct { ~TrivialDestruct()=default; }; struct NoopDestruct { ~NoopDestruct(){} }; unsigned count=0; struct CountDestruct { ~CountDestruct(){++count;} }; // Here loop optimization fails in GCC-11.1-11.2 // But succeeds in GCC 7.1-10.3 // // NOTE that adding -fno-lifetime-dse re-enabled the optimization template void pop_back_many(Vec<int>&, unsigned); // Here loop optimization fails in GCC-9.1-11.2 // But succeeds in GCC 7.1-8.5 // // NOTE that adding -fno-lifetime-dse re-enabled the optimization template void pop_back_many(Vec<TrivialDestruct>&, unsigned); // Here loop optimization succeeds in all versions // // NOTE that it's surprising that a no-op destructor can be optimized // but a trivial destructor can't template void pop_back_many(Vec<NoopDestruct>&, unsigned); // Here loop optimization succeeds in all version // // NOTE that it's surprising that a destructor with an action // can be optimized, but a trivial destructor can't template void pop_back_many(Vec<CountDestruct>&, unsigned); ```