https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
--- Comment #32 from torvald at gcc dot gnu.org --- (In reply to James Greenhalgh from comment #28) > (In reply to torvald from comment #24) > > 3) We could do something just on ARM (and scan other arcs for similar > > issues). That's perhaps the cleanest option. > > Which leaves 3). From Andrew's two proposed solutions: 3) Also seems best to me. 2) is worst, 1) is too much of a stick. > This also gives us an easier route to fixing any issues with the > acquire/release __sync primitives (__sync_lock_test_and_set and > __sync_lock_release) if we decide that these also need to be stronger than > their C++11 equivalents. I don't think we have another case of different __sync vs. __atomics semantics in case of __sync_lock_test_and_set. The current specification makes it clear that this is an acquire barrier, and how it describes the semantics (ie, loads and stores that are program-order before the acquire op can move to after it) , this seems to be consistent with the effects C11 specifies for acquire MO (with perhaps the distinction that C11 is clear that acquire needs to be paired with some release op to create an ordering constraint). I'd say this basically also applies to __sync_lock_release, with the exception that the current documentation does not mention that stores can be speculated to before the barrier. That seems to be an artefact of a TSO model. However, I don't think this matters much because what the release barrier allows one to do is reasoning that if one sees the barrier to have taken place (eg, observe that the lock has been released), then also all ops before the barrier will be visible. It does not guarantee that if one observes an effect that is after the barrier in program order, that the barrier itself will necessarily have taken effect. To be able to make this observation, one would have to ensure using __sync ops that the other effect after the barrier is indeed after the barrier, which would mean using an release op for the other effect -- which would take care of things. If everyone agrees with this reasoning, we probably should add documentation explaining this. However, I guess some people relying on data races in their programs could (mis?)understand the __sync_lock_release semantics to mean that it is a means to get the equivalent of a C11 release *fence* -- which it is not because the fence would apply to the (erroneously non-atomic) store after the barrier, which could one lead to believe that if one observes the store after the barrier, the fence must also be in effect. Thoughts?