On 06/14/2016 01:05 PM, Nathaniel Smith wrote:
On Jun 14, 2016 12:38 PM, "Burlen Loring" <blor...@lbl.gov
<mailto:blor...@lbl.gov>> wrote:
>
> On 06/14/2016 12:28 PM, Julian Taylor wrote:
>>
>> On 14.06.2016 19:34, Burlen Loring wrote:
>>
>>>
>>> here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how
>>> can numpy be thread safe? and how can someone using the C-API know
where
>>> it's necessary to acquire the GIL? Maybe someone can explain this?
>>>
>>
>> numpy only releases the GIL when it is not accessing any python
objects or other non-threadsafe structures anymore.
>> That is usually during computation loops and IO.
>>
>>
>> Your problem is indeed a missing PyGILState_Ensure
>>
>> I am assuming that the threads you are using are not created by
python, so you don't have a threadstate setup and no GIL.
>> You do set it up with that function, see
https://docs.python.org/2/c-api/init.html#non-python-created-threads
>
> I'm already hold the GIL in each thread via the mechanism you
pointed to, and I have verified this with gdb, some how GIL is being
released. re-acquiring the GIL solves the issue, but it technically
should cause a deadlock to acquire 2x in the same thread. I suspect
Numpy use of Py_BEGIN_ALLOW_THREADS is cause of the issue. It will
take some work to verify.
It's legal to call PyGILState_Ensure when you already have the GIL;
the whole point of that function is that you can use it whether you
have the GIL or not. However, if you already have the GIL, then it's a
no-op, so it shouldn't have fixed your problems. If it did help, then
this strongly suggests that you've missed something in your analysis
of when you hold the GIL.
While bugs are always possible, it's unlikely that this has anything
to do with numpy using Py_BEGIN_ALLOW_THREADS. In theory numpy's use
is safe, because it always follows the pattern of dropping the GIL,
doing a chunk of work that is careful not to touch any globals or the
python api, and then reacquiring the GIL. In practice it's possible
that the code does something else in some edge case, but if so then
it's a pretty subtle issue that's being triggered by some unusual
thing about how you call into numpy.
Thank you guys for the feedback and being a sounding board for my
explorations and ideas.
I think I got to the bottom of it. I think you are right it has nothing
to do with numpy. Also, I am indeed acquiring the GIL before invoking
the callback and releasing it after which is the right thing to do.
However, it turns out SWIG brackets wrapped C++ code with
Py_BEGIN/END_ALLOW_THREADS blocks, thus any calls through SWIG wrapped
code from within the callback release the GIL! I guess this normally
wouldn't be an issue, except that I have used %extend directives and
used Python and Numpy C-API's in a bunch of places to provide Python
specific interface to our data structures or do stuff more seamlessly
and/or beyond what's possible with typemaps. SWIG releases the GIL prior
to invoking my extensions, which hit the C-API, subsequently chaos
ensues. I think the solution is to acquire the GIL again in all these
extensions where I touch Python C-API. It seems to have solved the problem!
Thanks and regrets for all the discussion on the numpy list which
probably belongs in the swig list.
Burlen
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion