https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80640
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |UNCONFIRMED Ever confirmed|1 |0 --- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> --- The issue boils down to just void f(int *p) { while (*p) __atomic_thread_fence(2); } which with -O2 -fno-tree-ter is compiled to f: movl (%rdi), %eax # *p_3(D), _4 .L6: testl %eax, %eax # _4 jne .L6 #, rep ret the .optimized dump looks as expected, but then __atomic_thread_fence(2) is expanded into nothing, so the load is hoisted during RTL transforms. Note that the source declares opal_list_next as volatile struct opal_list_item_t *opal_list_next; but the 'volatile' qualifier applies to the pointed-to struct, not the field itself. If written as volatile struct opal_list_item_t *volatile opal_list_next; then the problematic hoisting does not happen.