[Mingw-w64-public] __faststorefence and mingw-w64 philosophy

dw Sun, 02 Jun 2013 19:27:21 -0700

So, a question about __faststorefence. The current implementation inwinnt.h is incorrect. I have 3 alternates to propose, and which one is"best" depends on the goals of the mingw-w64 project. One approach is"just do what MSVC does." However, there's also something to be saidfor "generate the fastest possible code." And for completeness, there'salso "use a built-in." Details with pros/cons below.


First, the current code:


    __MINGW_INTRIN_INLINE void __faststorefence(void) {
      __asm__ __volatile__ ("" ::: "memory");
    }

While the "memory" clobber generates a readwritebarrier() for thecompiler, __faststorefence must<http://msdn.microsoft.com/en-us/library/t710k390%28v=VS.80%29.aspx>also generate some type of fence instruction for the processor (sfence,mfence, lock).


So, we can:

1) Just map this to __sync_synchronize<http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/_005f_005fsync-Builtins.html#_005f_005fsync-Builtins>().This does the full memory compiler barrier, and generates an mfenceinstruction.

pros:
- Uses builtin.

cons:
- Generates mfence instead of sfence (see timing #'s below).
- Generates mfence even if compiled with -mno-sse (mfence is sse2).

- Generates mfence instead of the "|lock or DWORD PTR [rsp], 0"| whichMSVC generates.

2) Map this to the same as MSVC. The "memory" clobber ensures thecompiler barrier, and the "lock" provides the fence:


asm ("lock or %[zero], (%%rsp)" :: [zero] "ri" (0) : "memory", "cc")

pros:
- consistent with MSVC.

cons:

- While sfence may have been slower when first introduced, it's fasterthan "or" now (see #'s below).


3) Use code like:

    __MINGW_INTRIN_INLINE void x__faststorefence(void) {

#ifdef __SSE__ // defined by gcc when sse instructions are available
      asm ("sfence" ::: "memory");
#else
      asm ("lock or %[zero], (%%rsp)" :: [zero] "ri" (0) : "memory", "cc");
#endif

    }

Pros:
- Uses the faster sfence if available.
- Falls back to "or" for max compatibility.

cons:
- Not consistent with MSVC.
- SFENCE is not necessarily the fastest on all processors.

I ran some timings using x64 on my i7, and this is what I find:

_mm_sfence:  3,589,817,193
lock or   : 14,960,719,245
_mm_mfence: 19,608,594,657

Obviously these results are going to be both highly hw specific anddepend heavily on the code surrounding them. Still...

If I were going to pick, I'd probably go with #3. It isn't 100%identical to MSVC, but it effectively produces the same results, andwill (at least on current processors) generate faster code.


Opinions?

dw
||

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

[Mingw-w64-public] __faststorefence and mingw-w64 philosophy

Reply via email to