https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90606
Bug ID: 90606 Summary: Replace mfence with faster xchg for std::memory_order_seq_cst. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: maxim.yegorushkin at gmail dot com Target Milestone: --- The following example: #include <atomic> std::atomic<int> a; void foo_seq_cst(int b) { a = b; } Compiles with `gcc-9.1 -O3 -std=c++17 -pthread` into foo_seq_cst(int): mov DWORD PTR a[rip], edi mfence ret Whereas `clang++-9 -O3 -std=c++17 -pthread` compiles it into: foo_seq_cst(int): # @foo_seq_cst(int) xchg dword ptr [rip + a], edi ret xchg was benchmarked to be 2-3x faster than mfence and Linux kernel switched to xchg were possible. gcc should also switch to using xchg for std::memory_order_seq_cst. See: https://lore.kernel.org/lkml/20160112150032-mutt-send-email-...@redhat.com/ https://stackoverflow.com/questions/56205324/why-do-gcc-inserts-mfence-where-clang-dont-use-it/