On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote: > sync builtins are described in the documentations as being full > memory barriers, with the possible exception of > __sync_lock_test_and_set. However, GCC is not enforcing the fact > that they are also full _optimization_ barriers. The RTL produced > by builtins does not in general include a memory optimization > barrier such as a set of (mem/v:BLK (scratch:P)). > > This can cause problems with lock-free algorithms, for example this: > > http://libdispatch.macosforge.org/trac/ticket/35 > > This can be solved either in generic code, by wrapping sync builtins > (before and after) with an asm("":::"memory"), or in the single > machine descriptions by adding a memory barrier in parallel to the > locked instructions or with the ll/sc instructions. > > Is the above analysis correct? Or should the users put explicit > compiler barriers?
I'd say they should be optimization barriers too (and at the tree level they I think work that way, being represented as function calls), so if they don't act as memory barriers in RTL, the *.md patterns should be fixed. The only exception should be IMHO the __SYNC_MEM_RELAXED variants - if the CPU can reorder memory accesses across them at will, why shouldn't the compiler be able to do the same as well? Jakub