http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57448
Bug ID: 57448
Summary: GCSE generates incorrect code with acquire barrier
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jleahy+gcc at gmail dot com
The following code:
extern int seq;
extern int data;
int reader() {
int data_copy;
int seq_copy;
for (int i=0; i<10; ++i) {
__atomic_load(&seq, &seq_copy, __ATOMIC_ACQUIRE);
data_copy = data;
}
return data_copy + seq_copy;
}
Compiled with "g++ -masm=intel -std=c++11 -S -O -fgcse test.cpp" using gcc
4.8.0 on x86_64 Linux, generates the following assembler:
.file "test.cpp"
.intel_syntax noprefix
.text
.globl _Z6readerv
.type _Z6readerv, @function
_Z6readerv:
.LFB0:
.cfi_startproc
mov edx, 10
mov ecx, DWORD PTR data[rip]
.L3:
mov eax, DWORD PTR seq[rip]
sub edx, 1
jne .L3
add eax, ecx
ret
.cfi_endproc
.LFE0:
.size _Z6readerv, .-_Z6readerv
.ident "GCC: (GNU) 4.8.0"
.section .note.GNU-stack,"",@progbits
Here the load of data was hoisted above the load of seq, which is in violation
of the acquire memory ordering.
On the wiki here (http://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations/Details)
it says "acquire is a barrier to hoisting code" and "CSE basically hoists
subexpressions into temporaries, so it would have the same logic apply as PRE:
valid across release, invalid across an acquire."