https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71245
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Peter Cordes from comment #0) > We don't need to allocate any stack space. We could implement the StoreLoad > barrier with lock or $0, -4(%esp) instead of reserving extra stack to avoid > doing it to our return address (which would introduce extra store-forwarding > delay before the ret could eventually retire). This can be trivially implemented in config/i386/sync.md by changing (define_insn "mfence_nosse" [(set (match_operand:BLK 0) (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE)) (clobber (reg:CC FLAGS_REG))] "!(TARGET_64BIT || TARGET_SSE2)" "lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0}" [(set_attr "memory" "unknown")]) Recently x86 linux changed the barrier to what you propose. If it is worth, we can change it without any problems. OTOH, we have "orl" here - should we change it to "addl" to be consistent with kernel?