https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67461
--- Comment #2 from Peter Cordes <peter at cordes dot ca> --- (In reply to Andrew Pinski from comment #1) > Hmm, I think there needs to be a barrier between each store as each store > needs to be observed by the other threads. On x86, stores are already ordered wrt. other stores. A full-barrier (including a StoreLoad barrier) after the last store will prevent it from passing (appearing after) any subsequent loads. StoreStore, LoadLoad, and LoadStore barriers are implicit between every memory operation. (except non-temporal ones). http://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ I *think* that's enough for sequential consistency. If *I'm* misunderstanding this (which is possible), then please clue me in. There's definitely a problem on ARM, though. There's no way two consecutive dmb sy instructions are useful.