Re: [Cython] Acquisition counted cdef classes

2011-10-26 Thread Stefan Behnel

Greg Ewing, 26.10.2011 00:27:

Dag Sverre Seljebotn wrote:


I'd gladly take a factor two (or even four) slowdown of CPython code any
day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
and consider a 10x speedup better than nothing...


Another thing to consider is that locking around refcount
changes may not be as expensive in typical Cython code as
it is in Python.

The trouble with Python is that you can't so much as scratch
your nose without touching a big pile of ref counts. But
if the Cython code is only dealing with a few Python objects
and doing most of its work at the C level, the relative
overhead of locking around refcount changes may not be
significant.

So it may be worth trying the strategy of just acquiring
the GIL whenever a refcount needs to be changed in a nogil
section, and damn the consequences.


Hmm, interesting. That would give new semantics to "nogil" sections, basically:

"""
You can do Python interaction in nogil code, however, this will slow down 
your code. Cython will generate C code to acquire and release the GIL 
around any Python interaction that your code performs, thus serialising any 
calls into the CPython runtime. If you want to avoid this serialisation, 
use "cython -a" to find out where Python interaction happens and use static 
typing to let Cython generate C code instead.

"""

In other words: "with gil" sections hold the GIL by default and give it 
away on explicit request, whereas "nogil" sections have the GIL released by 
default and acquire it on implicit need.


The advantage over object level locking is that this does not increase the 
in-memory size of the object structs, and that it works with *any* Python 
object, not just extension types with a compile time known type.


I kind of like that.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Acquisition counted cdef classes

2011-10-26 Thread mark florisson
On 26 October 2011 08:56, Stefan Behnel  wrote:
> Greg Ewing, 26.10.2011 00:27:
>>
>> Dag Sverre Seljebotn wrote:
>>
>>> I'd gladly take a factor two (or even four) slowdown of CPython code any
>>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
>>> and consider a 10x speedup better than nothing...
>>
>> Another thing to consider is that locking around refcount
>> changes may not be as expensive in typical Cython code as
>> it is in Python.
>>
>> The trouble with Python is that you can't so much as scratch
>> your nose without touching a big pile of ref counts. But
>> if the Cython code is only dealing with a few Python objects
>> and doing most of its work at the C level, the relative
>> overhead of locking around refcount changes may not be
>> significant.
>>
>> So it may be worth trying the strategy of just acquiring
>> the GIL whenever a refcount needs to be changed in a nogil
>> section, and damn the consequences.
>
> Hmm, interesting. That would give new semantics to "nogil" sections,
> basically:
>
> """
> You can do Python interaction in nogil code, however, this will slow down
> your code. Cython will generate C code to acquire and release the GIL around
> any Python interaction that your code performs, thus serialising any calls
> into the CPython runtime. If you want to avoid this serialisation, use
> "cython -a" to find out where Python interaction happens and use static
> typing to let Cython generate C code instead.
> """
>
> In other words: "with gil" sections hold the GIL by default and give it away
> on explicit request, whereas "nogil" sections have the GIL released by
> default and acquire it on implicit need.
>
> The advantage over object level locking is that this does not increase the
> in-memory size of the object structs, and that it works with *any* Python
> object, not just extension types with a compile time known type.
>
> I kind of like that.

My problem with that is that if there if any other python thread,
you're likely just going to sleep for thousands of CPU cycles as that
thread will keep the GIL. Doing this implicitly for operations with
such overhead would be unacceptable. I think writing 'with gil:' is
fine, it's the performance that's the problem in the first place which
prevents you from doing that, not the 9 characters you need to type.

What I would like is having Cython infer whether the GIL is needed for
a function, and mark it "implicitly nogil", so it can be called from
nogil contexts without actually having to declare it nogil. This would
only work for non-extern things, and you would still need to declare
it nogil in your pxd if you want to export it. Apparently many users
(even those that have used Cython quite a bit) are confused with what
nogil on functions actually does (or they are not even aware it
exists).

> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Acquisition counted cdef classes

2011-10-26 Thread Dag Sverre Seljebotn

On 10/26/2011 11:45 AM, mark florisson wrote:

On 26 October 2011 08:56, Stefan Behnel  wrote:

Greg Ewing, 26.10.2011 00:27:


Dag Sverre Seljebotn wrote:


I'd gladly take a factor two (or even four) slowdown of CPython code any
day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
and consider a 10x speedup better than nothing...


Another thing to consider is that locking around refcount
changes may not be as expensive in typical Cython code as
it is in Python.

The trouble with Python is that you can't so much as scratch
your nose without touching a big pile of ref counts. But
if the Cython code is only dealing with a few Python objects
and doing most of its work at the C level, the relative
overhead of locking around refcount changes may not be
significant.

So it may be worth trying the strategy of just acquiring
the GIL whenever a refcount needs to be changed in a nogil
section, and damn the consequences.


Hmm, interesting. That would give new semantics to "nogil" sections,
basically:

"""
You can do Python interaction in nogil code, however, this will slow down
your code. Cython will generate C code to acquire and release the GIL around
any Python interaction that your code performs, thus serialising any calls
into the CPython runtime. If you want to avoid this serialisation, use
"cython -a" to find out where Python interaction happens and use static
typing to let Cython generate C code instead.
"""

In other words: "with gil" sections hold the GIL by default and give it away
on explicit request, whereas "nogil" sections have the GIL released by
default and acquire it on implicit need.

The advantage over object level locking is that this does not increase the
in-memory size of the object structs, and that it works with *any* Python
object, not just extension types with a compile time known type.

I kind of like that.


My problem with that is that if there if any other python thread,
you're likely just going to sleep for thousands of CPU cycles as that
thread will keep the GIL. Doing this implicitly for operations with
such overhead would be unacceptable. I think writing 'with gil:' is
fine, it's the performance that's the problem in the first place which
prevents you from doing that, not the 9 characters you need to type.


You are sure about the complete impossibility of having a seperate 
thread doing all INCREFs and DECREFs posted to it asynchronously (in the 
order they are posted), without race conditions?




What I would like is having Cython infer whether the GIL is needed for
a function, and mark it "implicitly nogil", so it can be called from
nogil contexts without actually having to declare it nogil. This would
only work for non-extern things, and you would still need to declare
it nogil in your pxd if you want to export it. Apparently many users
(even those that have used Cython quite a bit) are confused with what
nogil on functions actually does (or they are not even aware it
exists).


There's a long thread by me and Robert (and some of Stefan) on this from 
a couple of months back, don't know if you read it. You could support 
exports across pxds as well. Basically for *every* cdef function, export 
two function pointers:


 1) To a wrapper to be called if you hold the GIL (outside nogil sections)

 2) To a wrapper to be called if you don't hold the GIL, or don't know 
whether you hold the GIL (the wrapper can acquire the GIL if needed)


Taking the address of a function (for passing to C, e.g.) would give you 
the one that can be called without holding the GIL.


The implications should hopefully be getting rid of "with gil" and 
"nogil" on function declarations entirely.


Dag Sverre
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Acquisition counted cdef classes

2011-10-26 Thread Dag Sverre Seljebotn

On 10/26/2011 11:45 AM, mark florisson wrote:

On 26 October 2011 08:56, Stefan Behnel  wrote:

Greg Ewing, 26.10.2011 00:27:


Dag Sverre Seljebotn wrote:


I'd gladly take a factor two (or even four) slowdown of CPython code any
day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
and consider a 10x speedup better than nothing...


Another thing to consider is that locking around refcount
changes may not be as expensive in typical Cython code as
it is in Python.

The trouble with Python is that you can't so much as scratch
your nose without touching a big pile of ref counts. But
if the Cython code is only dealing with a few Python objects
and doing most of its work at the C level, the relative
overhead of locking around refcount changes may not be
significant.

So it may be worth trying the strategy of just acquiring
the GIL whenever a refcount needs to be changed in a nogil
section, and damn the consequences.


Hmm, interesting. That would give new semantics to "nogil" sections,
basically:

"""
You can do Python interaction in nogil code, however, this will slow down
your code. Cython will generate C code to acquire and release the GIL around
any Python interaction that your code performs, thus serialising any calls
into the CPython runtime. If you want to avoid this serialisation, use
"cython -a" to find out where Python interaction happens and use static
typing to let Cython generate C code instead.
"""

In other words: "with gil" sections hold the GIL by default and give it away
on explicit request, whereas "nogil" sections have the GIL released by
default and acquire it on implicit need.

The advantage over object level locking is that this does not increase the
in-memory size of the object structs, and that it works with *any* Python
object, not just extension types with a compile time known type.

I kind of like that.


My problem with that is that if there if any other python thread,
you're likely just going to sleep for thousands of CPU cycles as that
thread will keep the GIL. Doing this implicitly for operations with
such overhead would be unacceptable. I think writing 'with gil:' is
fine, it's the performance that's the problem in the first place which
prevents you from doing that, not the 9 characters you need to type.


I'm with Stefan here. We have more or less the exact same problem if you 
inadvertendly do arithmetic with Python floats rather than C doubles. 
The workflow then is to check the HTML for yellow lines. Same with the 
GIL (we could even introduce a new color in the HTML report for where 
you hold the GIL and not).


The advice to get fast code is

But, we should also introduce directives that emit warnings in both of 
these situations, that you can use while developing to quickly pinpoint 
source code lines ("Type of variable not inferred", "GIL automatically 
acquired").


DS



What I would like is having Cython infer whether the GIL is needed for
a function, and mark it "implicitly nogil", so it can be called from
nogil contexts without actually having to declare it nogil. This would
only work for non-extern things, and you would still need to declare
it nogil in your pxd if you want to export it. Apparently many users
(even those that have used Cython quite a bit) are confused with what
nogil on functions actually does (or they are not even aware it
exists).


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Acquisition counted cdef classes

2011-10-26 Thread Dag Sverre Seljebotn

On 10/26/2011 12:29 PM, Dag Sverre Seljebotn wrote:

On 10/26/2011 11:45 AM, mark florisson wrote:

On 26 October 2011 08:56, Stefan Behnel wrote:

Greg Ewing, 26.10.2011 00:27:


Dag Sverre Seljebotn wrote:


I'd gladly take a factor two (or even four) slowdown of CPython
code any
day to get rid of the GIL :-). The thing is, sometimes one has 48
cores
and consider a 10x speedup better than nothing...


Another thing to consider is that locking around refcount
changes may not be as expensive in typical Cython code as
it is in Python.

The trouble with Python is that you can't so much as scratch
your nose without touching a big pile of ref counts. But
if the Cython code is only dealing with a few Python objects
and doing most of its work at the C level, the relative
overhead of locking around refcount changes may not be
significant.

So it may be worth trying the strategy of just acquiring
the GIL whenever a refcount needs to be changed in a nogil
section, and damn the consequences.


Hmm, interesting. That would give new semantics to "nogil" sections,
basically:

"""
You can do Python interaction in nogil code, however, this will slow
down
your code. Cython will generate C code to acquire and release the GIL
around
any Python interaction that your code performs, thus serialising any
calls
into the CPython runtime. If you want to avoid this serialisation, use
"cython -a" to find out where Python interaction happens and use static
typing to let Cython generate C code instead.
"""

In other words: "with gil" sections hold the GIL by default and give
it away
on explicit request, whereas "nogil" sections have the GIL released by
default and acquire it on implicit need.

The advantage over object level locking is that this does not
increase the
in-memory size of the object structs, and that it works with *any*
Python
object, not just extension types with a compile time known type.

I kind of like that.


My problem with that is that if there if any other python thread,
you're likely just going to sleep for thousands of CPU cycles as that
thread will keep the GIL. Doing this implicitly for operations with
such overhead would be unacceptable. I think writing 'with gil:' is
fine, it's the performance that's the problem in the first place which
prevents you from doing that, not the 9 characters you need to type.


I'm with Stefan here. We have more or less the exact same problem if you
inadvertendly do arithmetic with Python floats rather than C doubles.
The workflow then is to check the HTML for yellow lines. Same with the
GIL (we could even introduce a new color in the HTML report for where
you hold the GIL and not).

The advice to get fast code is


Sorry, I keep hitting post to early... "The advice to get fast code is 
still to 'eliminate the yellow lines'".


DS



But, we should also introduce directives that emit warnings in both of
these situations, that you can use while developing to quickly pinpoint
source code lines ("Type of variable not inferred", "GIL automatically
acquired").

DS



What I would like is having Cython infer whether the GIL is needed for
a function, and mark it "implicitly nogil", so it can be called from
nogil contexts without actually having to declare it nogil. This would
only work for non-extern things, and you would still need to declare
it nogil in your pxd if you want to export it. Apparently many users
(even those that have used Cython quite a bit) are confused with what
nogil on functions actually does (or they are not even aware it
exists).


Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel




___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Acquisition counted cdef classes

2011-10-26 Thread mark florisson
On 26 October 2011 11:23, Dag Sverre Seljebotn
 wrote:
> On 10/26/2011 11:45 AM, mark florisson wrote:
>>
>> On 26 October 2011 08:56, Stefan Behnel  wrote:
>>>
>>> Greg Ewing, 26.10.2011 00:27:

 Dag Sverre Seljebotn wrote:

> I'd gladly take a factor two (or even four) slowdown of CPython code
> any
> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores
> and consider a 10x speedup better than nothing...

 Another thing to consider is that locking around refcount
 changes may not be as expensive in typical Cython code as
 it is in Python.

 The trouble with Python is that you can't so much as scratch
 your nose without touching a big pile of ref counts. But
 if the Cython code is only dealing with a few Python objects
 and doing most of its work at the C level, the relative
 overhead of locking around refcount changes may not be
 significant.

 So it may be worth trying the strategy of just acquiring
 the GIL whenever a refcount needs to be changed in a nogil
 section, and damn the consequences.
>>>
>>> Hmm, interesting. That would give new semantics to "nogil" sections,
>>> basically:
>>>
>>> """
>>> You can do Python interaction in nogil code, however, this will slow down
>>> your code. Cython will generate C code to acquire and release the GIL
>>> around
>>> any Python interaction that your code performs, thus serialising any
>>> calls
>>> into the CPython runtime. If you want to avoid this serialisation, use
>>> "cython -a" to find out where Python interaction happens and use static
>>> typing to let Cython generate C code instead.
>>> """
>>>
>>> In other words: "with gil" sections hold the GIL by default and give it
>>> away
>>> on explicit request, whereas "nogil" sections have the GIL released by
>>> default and acquire it on implicit need.
>>>
>>> The advantage over object level locking is that this does not increase
>>> the
>>> in-memory size of the object structs, and that it works with *any* Python
>>> object, not just extension types with a compile time known type.
>>>
>>> I kind of like that.
>>
>> My problem with that is that if there if any other python thread,
>> you're likely just going to sleep for thousands of CPU cycles as that
>> thread will keep the GIL. Doing this implicitly for operations with
>> such overhead would be unacceptable. I think writing 'with gil:' is
>> fine, it's the performance that's the problem in the first place which
>> prevents you from doing that, not the 9 characters you need to type.
>
> You are sure about the complete impossibility of having a seperate thread
> doing all INCREFs and DECREFs posted to it asynchronously (in the order they
> are posted), without race conditions?

No I think it is possible, but I don't believe it will solve the
DECREF C compiler optimization prevention problem (unlikely() should
help there though) as it will still have to submit an asynchronous
DECREF without races which means it has to call some kind of
(synchronized or atomically operating) function (which prevented the
optimization). It would be nice to have as it would mean you can pass
stuff around in nogil mode without acquisition counting, and it would
mean you can implement these types that can be used in nogil mode and
can synchronize using their own lock (if needed).

I wonder if deferring INCREFs are safe though. What if you have one
reference, you INCREF (deferred, because you don't have the GIL), you
call some function that steals your reference (after you obtained the
GIL), you somehow cause the program to lose the stolen reference which
causes it to be collected, and then the reference counter thread
decides to do the INCREF (too late). You also cannot atomically
INCREF, and Python doesn't do that, so there could be a race there as
well. So I think you really need the GIL to INCREF, and you need to do
it synchronously (I'm not completely sure, please feel free to poke
holes in my logic any time :).

I think it would be nicer to just fix this in CPython in any case,
though. Reference counting is terrible to work with in general
(regardless of whether you do them immediately or defer them), and
it's part of the reason why we have a GIL (although really not the
only one). As long as CPython does reference counting, removing the
GIL is an absolute no-go (although I wonder how many architectures
don't support atomic reference counting).

Refcounting has upsides too, though. One is more deterministic
collection of objects and destructor calling. Of course this argument
becomes moot if you have a reference cycle somewhere.

Has anyone ever attempted to implement a garbage collector for
CPython? Or did everyone who wanted this feature move to PyPy?

>>
>> What I would like is having Cython infer whether the GIL is needed for
>> a function, and mark it "implicitly nogil", so it can be called from
>> nogil contexts without actually having to declare it nogil. This wo