https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66881
Bug ID: 66881 Summary: Possibly inefficient std::atomic<int> codegen on x86 for simple arithmetic Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tkoeppe at google dot com Target Milestone: --- Consider these two simple versions of addition: #include <atomic> std::atomic<int> x; int y; void f(int a) { x.store(x.load(std::memory_order_relaxed) + a, std::memory_order_relaxed); } void g(int a) { y += a; } GCC generates the following assembly: f(int): mov eax, DWORD PTR x[rip] add edi, eax mov DWORD PTR x[rip], edi ret g(int): add DWORD PTR y[rip], edi ret Now, it is clear to me that the correct atomic codegen for store() and load() is "mov", as it appears here, but why aren't the two consecutive operations not folded into a single add? Aren't the semantics and the memory ordering the same? x86 says that (most) "reads" and "writes" are strongly ordered; doesn't that apply to the read and write produced by "add", too? (My original motivation came from a variant of this with floats, where the non-atomic code executed noticeably faster, even though I would have expected the two to produce the same machine code.)