https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68616
--- Comment #2 from torvald at gcc dot gnu.org --- I basically don't know anything about IPA, so I'll just assume that by "barrier" you mean conditions prohibiting transformations. I'm also not sure whether just CSE is a problem here, so I'll try to give an unspecific but broad answer. I'll assume we're looking at cases of nonatomic accesses to foo (such as ptr in the test case) and calls to the (new) atomic builtins on different variables bar1,bar2,... as synchronization points. Mixed atomic/nonatomic accesses to the same memory location are in most cases a bug, and I believe we discourage it, but they might not generally be incorrect (e.g., if loading through an atomic op and observing a value that indicates we're the only thread accessing it, subsequent nonatomic accesses might be fine); maybe we should just let an atomic access to bar1 be a barrier for all movement/merging across/involving this access. Any old atomic builtins (ie, __synch) can probably be handled like the equivalent calls to the new builtins. I believe we don't need to do includes volatiles, because even if they are used in old-style code, they should have asm compiler barriers around them -- and I hope we're handling those correctly. Because &foo != &bar, atomic stores to bar must be __ATOMIC_RELEASE or stronger, and atomic loads to bar must be __ATOMIC_ACQUIRE or stronger; otherwise, there's no implied ordering relationship between the foo and bar accesses. A good way to find out what transformations are or are not allowed is to consider the data-race-freedom (DRF) requirement and which regions an implementation would be allowed to execute atomically. For example, "foo = 1; bar.store(1, release); foo = 2;": The implementation is allowed to execute that in one atomic step, so there cannot be a nonatomic read of fooin another thread that's triggered by the store to bar because there would be a data race in this case. So, foo=1 can be removed if we assume DRF. (Note that old synchronization code may not be written to guarantee DRF; so perhaps this should be conditional on -std=c11 or such.) "temp=foo; if (bar.load(acquire)) temp2=foo;" is similar. However, this example here (basically the test case) doesn't allow for optimizing across the synchronization: "foo = 1; temp=foo; bar.store(1, release); if (bar.load(acquire)==2) temp2=foo;" This is because both loads of foo cannot be executed in one atomic step, as it needs another thread to increment bar to the value 2; also, there is no nonatomic access to foo between the two atomic accesses, so we cannot derive that those other threads that execute between the atomics don't access foo because of DRF. I hope this helps, though I'm aware it's pretty general. If you have a list of specific transformations, I could look through that and say which ones are correct or not. Perhaps it's okay for now if we assume that all non-relaxed atomics are transformation barriers, as we do elsewhere AFAIK.