https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115802
--- Comment #5 from Ivan Bodrov <securesneakers at gmail dot com> --- Without the mutex threads would race for the first access to non-atomic variable. Seq-Cst ordering is only used to simplify the example, it can be relaxed to acquire/release/relaxed for different operations. I didn't want people to spend too much time thinking about it. I guess even shorter demonstration would be using inlint asm with memory clobbering: while (!val) __asm__ volatile ("":::"memory"); The above forces GCC to re-load "val" on every iteration and the code compiles to: .L8: mov edx, DWORD PTR val[rip] test edx, edx je .L8 But if the same fence is within the function, it won't have any effect: __attribute__((__noinline__)) void fence(void) { __asm__ volatile ("":::"memory"); } ... while (!val) fence(); Compiles to: .L8: call func jmp .L8