[Cython] PR on refcounting memoryview buffers

Sturla Molden Mon, 18 Feb 2013 11:09:24 -0800

As Stefan suggested, I have posted a PR for a better fix for the issuewhen MinGW for some reason emits the symbol "__synch_fetch_and_add_4"instead of generating atomic opcode for the __synch_fetch_and_add builtin.


The PR is here:
https://github.com/cython/cython/pull/185


The discussion probably belongs on this list instead og Cython user:

The problem this addresses is when GCC does not use atomic builtins andemits __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 when Cythonare internally refcounting memoryview buffers. For some reason it caneven happen on x86 and amd64.

My PR undos Marks quick fix that always uses PyThread_acquire_lock onMinGW. PyThread_acquire_lock uses a kernel object (semaphore) on Windowsand is not very efficient. I want slicing memoryviews to be fast, andthat means PyThread_acquire_lock must go. My PR uses Windows API atomicfunction InterlockedAdd to implement the semantics of__synch_fetch_and_add_4 and __synch_fetch_and_sub_4 instead of using aPython lock.

Usually MinGW is configured to compile GNU atomic builtins correctly. Ihave yet to see a case where it is not. But obviously one user (JFGallant) has encountered it. I don't think it is a MinGW specificproblem, but currently it has only been seen on MinGW and the fix isMinGW specific (well, it should work on Cygwin too). But whenever MinGWdoes use atomic builtins it just uses them. So it incurs no speedpenalty on well-behaved MinGW builds.

I took the liberty to use GNU extensions __inline__ and__attribute(always_inline)__. They will make sure the functions alwaysbehave like macros. The rationale being that it is GCC specific code sowe can assume GNU extensions are available. If we take them away thecode should still work, but we have no guarantee the functions will beinlined. I did not use macros because __synch_fetch_and_add is emittedby the preprocessor, and thus GCC will presumably emit__synch_fetch_and_sub_4 after the preprocessing step, which couldrequire __synch_fetch_and_sub_4 to be a function instead of anothermacro. (I have no way of finding it out since I cannot test for it.)




Regarding Linux and OSX:

Failure of GCC to use atomic builtins could also happen on other GCCbuilds though. I don't think it is a MinGW-only issue. It's probably dueto how the GCC build was configured. So we should as a safeguard havethis for other OSes too.


http://developer.apple.com/library/ios/#DOCUMENTATION/System/Conceptual/ManPages_iPhoneOS/man3/OSAtomicAdd32.3.html

We probably just need similar code to what I wrote for MinGW. I canwrite the code, but I don't have a Mac on which to test it.

Also we should use OSAtomic* on clang/LLVM, which is now the platform Ccompiler on OSX. This will avoid PyThread_acquire_lock being the commonsynch mechanism for refcounting memoryview buffers on OSX.

On Linux I am not sure what to suggest if GCC fails to use atomicbuiltins. I can handcode inline assembly for x86/amd64. I could also usepthreads and pth threads locks. But we could also assume that it neverhappen and just let the linker fail on __synch_fetch_and_add_4.




Sturla
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

[Cython] PR on refcounting memoryview buffers

Reply via email to