https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66881
Bug ID: 66881
Summary: Possibly inefficient std::atomic<int> codegen on x86
for simple arithmetic
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: tkoeppe at google dot com
Target Milestone: ---
Consider these two simple versions of addition:
#include <atomic>
std::atomic<int> x;
int y;
void f(int a) {
x.store(x.load(std::memory_order_relaxed) + a, std::memory_order_relaxed);
}
void g(int a) {
y += a;
}
GCC generates the following assembly:
f(int):
mov eax, DWORD PTR x[rip]
add edi, eax
mov DWORD PTR x[rip], edi
ret
g(int):
add DWORD PTR y[rip], edi
ret
Now, it is clear to me that the correct atomic codegen for store() and load()
is "mov", as it appears here, but why aren't the two consecutive operations not
folded into a single add? Aren't the semantics and the memory ordering the
same? x86 says that (most) "reads" and "writes" are strongly ordered; doesn't
that apply to the read and write produced by "add", too?
(My original motivation came from a variant of this with floats, where the
non-atomic code executed noticeably faster, even though I would have expected
the two to produce the same machine code.)