https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67461
--- Comment #4 from Peter Cordes <peter at cordes dot ca> --- (In reply to Andrew Pinski from comment #1) > Hmm, I think there needs to be a barrier between each store as each store > needs to be observed by the other threads. As we agreed earlier, a full barrier is only needed after the last store, and any weaker barriers happen for free on x86. But it turns out that the standard doesn't even require other threads to even have a possibility of observing each store separately. It's possible for observers to see the a=1; a=1; a=3; all happen together, so we can decide at compile time that that's what every observer will always see (by collapsing them to one store). This is a fairly aggressive optimization that violates the principle of least surprise for some users, and there is discussion about clarifying this / giving programmers control over this. See http://wg21.link/p0062 When should compilers optimize atomics? and also http://wg21.link/n4455. Also, there was some discussion about this on http://stackoverflow.com/questions/39393850/can-num-be-atomic-for-int-num. See the last section of my answer there (the accepted one) for the results of discussion about this with Richard Hodges in comments on his answer.