On Fri, Feb 17, 2017 at 5:59 PM, Uros Bizjak <[email protected]> wrote: > On Fri, Feb 17, 2017 at 5:30 PM, Jakub Jelinek <[email protected]> wrote: >> On Sun, May 29, 2016 at 11:10:15PM +0200, Uros Bizjak wrote: >>> As explained in PR71245, comment #3 [1], it is better to use offset -4 >>> to a %esp to implement a non-SSE memory fence instruction: >>> >>> -q- >>> >>> I guess it costs a code byte for a disp8 in the addressing mode, but >>> it avoids adding a lot of latency to a critical path involving a >>> spill/reload to (%esp), in functions where there is something at >>> (%esp). >>> >>> If it's an object larger than 4B, the lock orl could even cause a >>> store-forwarding stall when the object is reloaded. (e.g. a double or >>> a vector). >>> >>> Ideally we could do the lock orl on some padding between two locals, >>> or on something in memory that wasn't going to be loaded soon, to >>> avoid touching more stack memory (which might be in the next page >>> down). But we still want to do it on a cache line that's hot, so >>> going way up above our own stack frame isn't good either. >> >> Unfortunately this makes valgrind unhappy about that: >> https://bugzilla.redhat.com/show_bug.cgi?id=1423434 >> I assume it will complain now on anything pre-SSE2 that contains the memory >> barrier in 32-bit code. >> Perhaps we should decrement and increment %esp around it or something >> similar (or push/pop)? Of course, that would mean we need to take care >> of async unwind info. > > Or, we can simply revert the patch? Not that the barrier performance > of non-SSE 32bit targets matter...
Attached patch was committed to mainline to revert 2016-05-30 change. 2017-02-19 Uros Bizjak <[email protected]> Revert: 2016-05-30 Uros Bizjak <[email protected]> * config/i386/sync.md (mfence_nosse): Use "lock orl $0, -4(%esp)". Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} Uros. --cut here-- Index: config/i386/sync.md =================================================================== --- config/i386/sync.md (revision 245574) +++ config/i386/sync.md (working copy) @@ -98,7 +98,7 @@ (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE)) (clobber (reg:CC FLAGS_REG))] "!(TARGET_64BIT || TARGET_SSE2)" - "lock{%;} or{l}\t{$0, -4(%%esp)|DWORD PTR [esp-4], 0}" + "lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0}" [(set_attr "memory" "unknown")]) --cut here--
