https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831
--- Comment #5 from Patrick O'Neill <patrick at rivosinc dot com> --- IIUC, Appendix A is incorrect. We cannot allow any memory ops to enter within the LR/SC pair, since a reordering like that is visible to other threads. Here's a litmus test showing this fact: (* LR/SC with .aq .rl bits does not allow read operations to be reordered within/beneath it. *) { 0:x6=a; 0:x8=b; 1:x6=a; 1:x8=b; } P0 | P1 ; lw x5,0(x6) | ori x1,x0,1 ; lr.w.aq.rl x7,0(x8) | sw x1,0(x8) ; ori x1,x0,1 | fence rw,rw ; sc.w.aq.rl x1,x1,0(x8) | sw x1,0(x6) ; ~exists (0:x5=1 /\ 0:x7=0 /\ b=1) In a sequentially consistent atomic operation (which this LRSC pair is emulating), it is not possible for both x5 to be loaded with a 1 and the LR/SC pair to load/operate on a 0. With the pairing of LR.aq/SC.aqrl this outcome is possible. Similarly, for LR.aqrl/SC.rl, a similar reordering needs to be forbidden: RISCV LRSC-WRITE (* LR/SC with .aq .rl bits does not allow write operations to be reordered within/above it. *) { 0:x8=b; 0:x10=c; 1:x8=b; 1:x10=c; } P0 | P1 ; ori x9,x0,1 | lw x9,0(x10); lr.w.aq.rl x7,0(x8) | fence rw,rw ; ori x7,x0,1 | lw x7,0(x8) ; sc.w.aq.rl x1,x7,0(x8) | ; sw x9,0(x10) | ; ~exists (1:x9=1 /\ 1:x7=0 /\ b=1) In a sequentially consistent atomic operation, it is not possible for both Hart 1's x9 to be loaded with a 1 and Hart 1's x7 to be loaded with a 0 (as long as the SC succeeds, which b=1 enforces). That outcome is possible with LR.aqrl/SC.rl since operations can get reordered within the LR/SC pairing.