https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84323
--- Comment #3 from Antony Polukhin <antoshkka at gmail dot com> --- Just noted that libc++ already does this optimization: https://godbolt.org/z/alw1sq libc++ directly accesses the content of std::once_flag and skips all the thread local accesses if call_once previously succeeded.