http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57448
Bug ID: 57448 Summary: GCSE generates incorrect code with acquire barrier Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jleahy+gcc at gmail dot com The following code: extern int seq; extern int data; int reader() { int data_copy; int seq_copy; for (int i=0; i<10; ++i) { __atomic_load(&seq, &seq_copy, __ATOMIC_ACQUIRE); data_copy = data; } return data_copy + seq_copy; } Compiled with "g++ -masm=intel -std=c++11 -S -O -fgcse test.cpp" using gcc 4.8.0 on x86_64 Linux, generates the following assembler: .file "test.cpp" .intel_syntax noprefix .text .globl _Z6readerv .type _Z6readerv, @function _Z6readerv: .LFB0: .cfi_startproc mov edx, 10 mov ecx, DWORD PTR data[rip] .L3: mov eax, DWORD PTR seq[rip] sub edx, 1 jne .L3 add eax, ecx ret .cfi_endproc .LFE0: .size _Z6readerv, .-_Z6readerv .ident "GCC: (GNU) 4.8.0" .section .note.GNU-stack,"",@progbits Here the load of data was hoisted above the load of seq, which is in violation of the acquire memory ordering. On the wiki here (http://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations/Details) it says "acquire is a barrier to hoisting code" and "CSE basically hoists subexpressions into temporaries, so it would have the same logic apply as PRE: valid across release, invalid across an acquire."