Hi all,
sync builtins are described in the documentations as being full memory
barriers, with the possible exception of __sync_lock_test_and_set.
However, GCC is not enforcing the fact that they are also full
_optimization_ barriers. The RTL produced by builtins does not in
general include a memory optimization barrier such as a set of
(mem/v:BLK (scratch:P)).
This can cause problems with lock-free algorithms, for example this:
http://libdispatch.macosforge.org/trac/ticket/35
This can be solved either in generic code, by wrapping sync builtins
(before and after) with an asm("":::"memory"), or in the single machine
descriptions by adding a memory barrier in parallel to the locked
instructions or with the ll/sc instructions.
Is the above analysis correct? Or should the users put explicit
compiler barriers?
Paolo