On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote:
> sync builtins are described in the documentations as being full
> memory barriers, with the possible exception of
> __sync_lock_test_and_set. However, GCC is not enforcing the fact
> that they are also full _optimization_ barriers.  The RTL produced
> by builtins does not in general include a memory optimization
> barrier such as a set of (mem/v:BLK (scratch:P)).
> 
> This can cause problems with lock-free algorithms, for example this:
> 
> http://libdispatch.macosforge.org/trac/ticket/35
> 
> This can be solved either in generic code, by wrapping sync builtins
> (before and after) with an asm("":::"memory"), or in the single
> machine descriptions by adding a memory barrier in parallel to the
> locked instructions or with the ll/sc instructions.
> 
> Is the above analysis correct?  Or should the users put explicit
> compiler barriers?

I'd say they should be optimization barriers too (and at the tree level
they I think work that way, being represented as function calls), so if
they don't act as memory barriers in RTL, the *.md patterns should be
fixed.  The only exception should be IMHO the __SYNC_MEM_RELAXED
variants - if the CPU can reorder memory accesses across them at will,
why shouldn't the compiler be able to do the same as well?

        Jakub

Reply via email to