https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104480
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|unknown |12.0 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- I don't think [intro.memory]/3 (wherever that should point to?) is realized this way on CPUs with a less strong memory ordering guarantee than x86. And we definitely do not ensure atomicity or commit order unless you use atomic access primitives. So I think this is invalid. As Andrew says we're happily combining void foo (double * __restrict a, double *b) { a[0] = b[0]; a[1] = b[1]; } into movupd (%rsi), %xmm0 movups %xmm0, (%rdi) since forever using vectorization which would have the exact same issue when the store crosses a cacheline boundary.