On 09/09/2011 04:17 AM, Jakub Jelinek wrote:
On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote:
sync builtins are described in the documentations as being full
memory barriers, with the possible exception of
__sync_lock_test_and_set. However, GCC is not enforcing the fact
that they are also full _optimization_ barriers. The RTL produced
by builtins does not in general include a memory optimization
barrier such as a set of (mem/v:BLK (scratch:P)).
This can cause problems with lock-free algorithms, for example this:
http://libdispatch.macosforge.org/trac/ticket/35
This can be solved either in generic code, by wrapping sync builtins
(before and after) with an asm("":::"memory"), or in the single
machine descriptions by adding a memory barrier in parallel to the
locked instructions or with the ll/sc instructions.
Is the above analysis correct? Or should the users put explicit
compiler barriers?
I'd say they should be optimization barriers too (and at the tree level
they I think work that way, being represented as function calls), so if
they don't act as memory barriers in RTL, the *.md patterns should be
fixed. The only exception should be IMHO the __SYNC_MEM_RELAXED
variants - if the CPU can reorder memory accesses across them at will,
why shouldn't the compiler be able to do the same as well?
Yeah, some of this is part of the ongoing C++0x work... the memory model
parameter is going to allow certain types of code movement in optimizers
based on whether its an acquire operation, a release operation, neither,
or both. It is ongoing and hopefully we will eventually have proper
consistency. The older __sync builtins are eventually going to invoke
the new__sync_mem routines and their new patterns, but will fall back to
the old ones if new patterns aren't specified.
In the case of your program, this would in fact be a valid
transformation I believe... __sync_lock_test_and_set is documented to
only have ACQUIRE semantics. This does not guarantee that a store BEFORE
the operation will be visible in another thread, which means it possible
to reorder it. (A summary of the different modes can be found at :
http://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations So you would
require a barrier before this code anyway for the behaviour you are
looking for. Once the new routines are available and implemented, you
could simply specify the SEQ_CST model and then it should, in theory,
work properly with a barrier being emitted for you.
I don't see anything in this pattern however that would enforce acquire
mode and prevent the reverse operation.. moving something from after to
before it... so there may be a bug there anyway.
And I suspect most people actually expect all the old __sync routines to
be full optimization barriers all the time... maybe we should consider
just doing that...
Andrew