2013/6/24 dw <[email protected]>:
> So, I have finished the changes I had in mind for this. I have tried to be
> clear about how I intend things to work (IOW, there's lots of comments).
> Changing headers/includes without breaking something elsewhere can be
> challenging, but I'm pretty sure I've got this right. On the plus side, if
> something's wrong, it shows up as a compile/link error, so it's easy to
> spot.
>
> The patch is attached. Note that there is an updated makefile.am, so you'll
> need to re-gen the related .in file. Here's what it includes:
>
> -----------------------------------------------------------------------
> - Move all the implementations for the intrinsics I have been working on
> from winnt.h to (new file) psdk_inc/intrin-impl.h.
> - Use __MINGW_INTRIN_INLINE instead of __CRT_INLINE for functions in
> psdk_inc/intrin-impl.h.
> - Add comments to intrin.h describing how the MSVC intrinsics work in gcc.
> - Add include for intrin-impl.h to intrin.h, protected by #ifdef.
> - Since winnt.h sometimes includes intrin.h, only declare the prototypes for
> intrinsics (the ones that winnt.h has always declared) if intrin.h isn't
> being included.
> - Use the same cygwin logic in winnt.h for both x86 and x64.
> - Make the corresponding changes to the files in mingw-w64-crt\intrincs to
> use the new approach in intrin-impl.h.
>
> It also includes the work from
> https://sourceforge.net/mailarchive/message.php?msg_id=31051013:
>
> 1) The existing code for __faststorefence doesn't actually generate a fence.
> It just generates a barrier. This patch maps __faststorefence to _mm_sfence
> (instead of doing MS's trick with "lock or"). sfence appears to be faster
> than MS's "fast" approach on modern processors.
>
> 2) MS's MemoryBarrier (which is supposed to be a full compiler barrier +
> processor fence) maps to __faststorefence. This works for MS because their
> __faststorefence trick ends up generating a full fence + full barrier.
> Since our __faststorefence now uses sfence, this patch adjusts MemoryBarrier
> to use _mm_mfence().
>
> 3) While there is a prototype for _ReadWriteBarrier in winnt.h, there is no
> implementation. Since _ReadWriteBarrier is -only- a compiler directive
> (rather like #pragma), there is no way to place it in a library. As a
> result, this patch implements it with a #define in both winnt.h & intrin.h.
>
> 4) Gcc doesn't actually support _ReadBarrier() and _WriteBarrier. This patch
> defines them as being mapped to _ReadWriteBarrier() with a #define in both
> winnt.h & intrin.h.
>
> 5) While there is a prototype for __int2c in winnt.h and intrin.h, there is
> no implementation. Since MS docs say this is only available as an
> intrinsic (what gcc calls builtin), this patch defines it with a macro in
> both winnt.h and intrin.h. (Update: This is now done as an inline routine +
> lib version)
>
> 6) The code for DbgRaiseAssertionFailure won't compile with -masm=intel.
> Use __builtint from intrin-mac.h.
>
> 7) Add __buildint to intrin-mac.h for DbgRaiseAssertionFailure & __int2c.
>
> 8) On x86, if SSE2 is available, use _mm_pause for YieldProcessor and
> _mm_mfence for MemoryBarrier. If SSE2 is not available, build the
> appropriate asm ("rep nop" for pause and "xchg" for MemoryBarrier).
> -----------------------------------------------------------------------
>
> If I were to change one thing, it would probably be to remove the #ifdef
> around the intrin-impl.h include in intrin.h. Why? When I tried to write
> the comment about when you might want to use __INTRINSIC_LIBRARY_ONLY, I
> couldn't come up with a single case. Adding complexity with no corresponding
> benefit is something I try to avoid.
Hmm, fair point.
> And while I'd *like* to change the definition for _ReadWriteBarrier from:
>
> #define _ReadWriteBarrier() __asm__ __volatile__ ("" ::: "memory")
>
> to
>
> extern __inline__ __attribute__((__always_inline__,__gnu_inline__))
> void _ReadWriteBarrier()
> {
> __asm__ __volatile__ ("" ::: "memory");
> }
>
> I can't completely convince myself they are -exactly- the same. Using an
> empty asm block is a tricky thing. For example, since there is no actual
> asm output, you cannot put this in a library and expect it to work. What's
> more, simply by virtue of the fact that you are calling a routine, you can
> implicitly get some of the effects of the memory barrier. But just because
> sometimes it looks like it might be working isn't the same as it being
> right.
Yes, that is a general problem about this kind of barrier. As long as
it is inlined, we are on the good side ... within a function, it might
be more ... and not what expected by user ...
> dw
Ok, patch is ok. I would like that Jacek takes a closer look to it, too.
Thanks,
Kai
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public