On 09/09/2011 04:17 AM, Jakub Jelinek wrote:
On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote:
sync builtins are described in the documentations as being full
memory barriers, with the possible exception of
__sync_lock_test_and_set. However, GCC is not enforcing the fact
that they are also full _optimization_ barriers.  The RTL produced
by builtins does not in general include a memory optimization
barrier such as a set of (mem/v:BLK (scratch:P)).

This can cause problems with lock-free algorithms, for example this:

http://libdispatch.macosforge.org/trac/ticket/35

This can be solved either in generic code, by wrapping sync builtins
(before and after) with an asm("":::"memory"), or in the single
machine descriptions by adding a memory barrier in parallel to the
locked instructions or with the ll/sc instructions.

Is the above analysis correct?  Or should the users put explicit
compiler barriers?
I'd say they should be optimization barriers too (and at the tree level
they I think work that way, being represented as function calls), so if
they don't act as memory barriers in RTL, the *.md patterns should be
fixed.  The only exception should be IMHO the __SYNC_MEM_RELAXED
variants - if the CPU can reorder memory accesses across them at will,
why shouldn't the compiler be able to do the same as well?
        
Yeah, some of this is part of the ongoing C++0x work... the memory model parameter is going to allow certain types of code movement in optimizers based on whether its an acquire operation, a release operation, neither, or both. It is ongoing and hopefully we will eventually have proper consistency. The older __sync builtins are eventually going to invoke the new__sync_mem routines and their new patterns, but will fall back to the old ones if new patterns aren't specified.

In the case of your program, this would in fact be a valid transformation I believe... __sync_lock_test_and_set is documented to only have ACQUIRE semantics. This does not guarantee that a store BEFORE the operation will be visible in another thread, which means it possible to reorder it. (A summary of the different modes can be found at : http://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations So you would require a barrier before this code anyway for the behaviour you are looking for. Once the new routines are available and implemented, you could simply specify the SEQ_CST model and then it should, in theory, work properly with a barrier being emitted for you.

I don't see anything in this pattern however that would enforce acquire mode and prevent the reverse operation.. moving something from after to before it... so there may be a bug there anyway.

And I suspect most people actually expect all the old __sync routines to be full optimization barriers all the time... maybe we should consider just doing that...

Andrew

Reply via email to