Re: should sync builtins be full optimization barriers?

Andrew MacLeod Fri, 09 Sep 2011 07:23:15 -0700

On 09/09/2011 04:17 AM, Jakub Jelinek wrote:

On Fri, Sep 09, 2011 at 10:07:30AM +0200, Paolo Bonzini wrote:

sync builtins are described in the documentations as being full
memory barriers, with the possible exception of
__sync_lock_test_and_set. However, GCC is not enforcing the fact
that they are also full _optimization_ barriers.  The RTL produced
by builtins does not in general include a memory optimization
barrier such as a set of (mem/v:BLK (scratch:P)).


This can cause problems with lock-free algorithms, for example this:

http://libdispatch.macosforge.org/trac/ticket/35

This can be solved either in generic code, by wrapping sync builtins
(before and after) with an asm("":::"memory"), or in the single
machine descriptions by adding a memory barrier in parallel to the
locked instructions or with the ll/sc instructions.

Is the above analysis correct?  Or should the users put explicit
compiler barriers?

I'd say they should be optimization barriers too (and at the tree level
they I think work that way, being represented as function calls), so if
they don't act as memory barriers in RTL, the *.md patterns should be
fixed.  The only exception should be IMHO the __SYNC_MEM_RELAXED
variants - if the CPU can reorder memory accesses across them at will,
why shouldn't the compiler be able to do the same as well?

Yeah, some of this is part of the ongoing C++0x work... the memory modelparameter is going to allow certain types of code movement in optimizersbased on whether its an acquire operation, a release operation, neither,or both. It is ongoing and hopefully we will eventually have properconsistency. The older __sync builtins are eventually going to invokethe new__sync_mem routines and their new patterns, but will fall back tothe old ones if new patterns aren't specified.

In the case of your program, this would in fact be a validtransformation I believe... __sync_lock_test_and_set is documented toonly have ACQUIRE semantics. This does not guarantee that a store BEFOREthe operation will be visible in another thread, which means it possibleto reorder it. (A summary of the different modes can be found at :http://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations So you wouldrequire a barrier before this code anyway for the behaviour you arelooking for. Once the new routines are available and implemented, youcould simply specify the SEQ_CST model and then it should, in theory,work properly with a barrier being emitted for you.

I don't see anything in this pattern however that would enforce acquiremode and prevent the reverse operation.. moving something from after tobefore it... so there may be a bug there anyway.

And I suspect most people actually expect all the old __sync routines tobe full optimization barriers all the time... maybe we should considerjust doing that...


Andrew

Re: should sync builtins be full optimization barriers?

Reply via email to