On 06/14/2016 01:05 PM, Nathaniel Smith wrote:

On Jun 14, 2016 12:38 PM, "Burlen Loring" <blor...@lbl.gov <mailto:blor...@lbl.gov>> wrote:
>
> On 06/14/2016 12:28 PM, Julian Taylor wrote:
>>
>> On 14.06.2016 19:34, Burlen Loring wrote:
>>
>>>
>>> here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how
>>> can numpy be thread safe? and how can someone using the C-API know where
>>> it's necessary to acquire the GIL? Maybe someone can explain this?
>>>
>>
>> numpy only releases the GIL when it is not accessing any python objects or other non-threadsafe structures anymore.
>> That is usually during computation loops and IO.
>>
>>
>> Your problem is indeed a missing PyGILState_Ensure
>>
>> I am assuming that the threads you are using are not created by python, so you don't have a threadstate setup and no GIL. >> You do set it up with that function, see https://docs.python.org/2/c-api/init.html#non-python-created-threads
>
> I'm already hold the GIL in each thread via the mechanism you pointed to, and I have verified this with gdb, some how GIL is being released. re-acquiring the GIL solves the issue, but it technically should cause a deadlock to acquire 2x in the same thread. I suspect Numpy use of Py_BEGIN_ALLOW_THREADS is cause of the issue. It will take some work to verify.

It's legal to call PyGILState_Ensure when you already have the GIL; the whole point of that function is that you can use it whether you have the GIL or not. However, if you already have the GIL, then it's a no-op, so it shouldn't have fixed your problems. If it did help, then this strongly suggests that you've missed something in your analysis of when you hold the GIL.

While bugs are always possible, it's unlikely that this has anything to do with numpy using Py_BEGIN_ALLOW_THREADS. In theory numpy's use is safe, because it always follows the pattern of dropping the GIL, doing a chunk of work that is careful not to touch any globals or the python api, and then reacquiring the GIL. In practice it's possible that the code does something else in some edge case, but if so then it's a pretty subtle issue that's being triggered by some unusual thing about how you call into numpy.


Thank you guys for the feedback and being a sounding board for my explorations and ideas.

I think I got to the bottom of it. I think you are right it has nothing to do with numpy. Also, I am indeed acquiring the GIL before invoking the callback and releasing it after which is the right thing to do. However, it turns out SWIG brackets wrapped C++ code with Py_BEGIN/END_ALLOW_THREADS blocks, thus any calls through SWIG wrapped code from within the callback release the GIL! I guess this normally wouldn't be an issue, except that I have used %extend directives and used Python and Numpy C-API's in a bunch of places to provide Python specific interface to our data structures or do stuff more seamlessly and/or beyond what's possible with typemaps. SWIG releases the GIL prior to invoking my extensions, which hit the C-API, subsequently chaos ensues. I think the solution is to acquire the GIL again in all these extensions where I touch Python C-API. It seems to have solved the problem!

Thanks and regrets for all the discussion on the numpy list which probably belongs in the swig list.

Burlen

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to