https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
--- Comment #35 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> --- (In reply to torvald from comment #32) > (In reply to James Greenhalgh from comment #28) > > This also gives us an easier route to fixing any issues with the > > acquire/release __sync primitives (__sync_lock_test_and_set and > > __sync_lock_release) if we decide that these also need to be stronger than > > their C++11 equivalents. > > I don't think we have another case of different __sync vs. __atomics > semantics in case of __sync_lock_test_and_set. The current specification > makes it clear that this is an acquire barrier, and how it describes the > semantics (ie, loads and stores that are program-order before the acquire op > can move to after it) , this seems to be consistent with the effects C11 > specifies for acquire MO (with perhaps the distinction that C11 is clear > that acquire needs to be paired with some release op to create an ordering > constraint). I think that the question is which parts of a RMW operation with MEMMODEL_ACQUIRE semantics is ordered. My understanding is that in C++11 MEMMODEL_ACQUIRE only applies to the "load" half of the operation. So an observer to: atomic_flag_test_and_set_explicit(foo, memory_order_acquire) atomic_store_exlicit (bar, 1, memory_model_relaxed) Is permitted to observe a write to bar before a write to foo (but not before the read from foo). My reading of the Itanium ABI is that the acquire barrier applies to the entire operation (Andrew, I think you copied these over exactly backwards in comment 34 ;) ): "Disallows the movement of memory references to visible data from after the intrinsic (in program order) to before the intrinsic (this behavior is desirable at lock-acquire operations, hence the name)." The definition of __sync_lock_test_and_set is: "Behavior: • Atomically store the supplied value in *ptr and return the old value of *ptr. (i.e.) { tmp = *ptr; *ptr = value; return tmp; } • Acquire barrier." So by the strict letter of the specification, no memory references to visible data should be allowed to move from after the entire body of the intrinsic to before it. That is to say in: __sync_lock_test_and_set (foo, 1) bar = 1 an observer should not be able to observe the write to bar before the write to foo. This is a difference from the C++11 semantics. I'm not worried about __sync_lock_release, I think the documentation is strong enough and unambiguous.