https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99613
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Even the swapping of the two calls would be IMHO a significant slowdown. Because __cxa_atexit under the hood holds a global lock (fortunately not across the duration of the whole user ctor, but across the internal bookkeeping it needs to do).