Re: [Cython] Acquisition counted cdef classes
Greg Ewing, 26.10.2011 00:27: Dag Sverre Seljebotn wrote: I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. Hmm, interesting. That would give new semantics to "nogil" sections, basically: """ You can do Python interaction in nogil code, however, this will slow down your code. Cython will generate C code to acquire and release the GIL around any Python interaction that your code performs, thus serialising any calls into the CPython runtime. If you want to avoid this serialisation, use "cython -a" to find out where Python interaction happens and use static typing to let Cython generate C code instead. """ In other words: "with gil" sections hold the GIL by default and give it away on explicit request, whereas "nogil" sections have the GIL released by default and acquire it on implicit need. The advantage over object level locking is that this does not increase the in-memory size of the object structs, and that it works with *any* Python object, not just extension types with a compile time known type. I kind of like that. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 26 October 2011 08:56, Stefan Behnel wrote: > Greg Ewing, 26.10.2011 00:27: >> >> Dag Sverre Seljebotn wrote: >> >>> I'd gladly take a factor two (or even four) slowdown of CPython code any >>> day to get rid of the GIL :-). The thing is, sometimes one has 48 cores >>> and consider a 10x speedup better than nothing... >> >> Another thing to consider is that locking around refcount >> changes may not be as expensive in typical Cython code as >> it is in Python. >> >> The trouble with Python is that you can't so much as scratch >> your nose without touching a big pile of ref counts. But >> if the Cython code is only dealing with a few Python objects >> and doing most of its work at the C level, the relative >> overhead of locking around refcount changes may not be >> significant. >> >> So it may be worth trying the strategy of just acquiring >> the GIL whenever a refcount needs to be changed in a nogil >> section, and damn the consequences. > > Hmm, interesting. That would give new semantics to "nogil" sections, > basically: > > """ > You can do Python interaction in nogil code, however, this will slow down > your code. Cython will generate C code to acquire and release the GIL around > any Python interaction that your code performs, thus serialising any calls > into the CPython runtime. If you want to avoid this serialisation, use > "cython -a" to find out where Python interaction happens and use static > typing to let Cython generate C code instead. > """ > > In other words: "with gil" sections hold the GIL by default and give it away > on explicit request, whereas "nogil" sections have the GIL released by > default and acquire it on implicit need. > > The advantage over object level locking is that this does not increase the > in-memory size of the object structs, and that it works with *any* Python > object, not just extension types with a compile time known type. > > I kind of like that. My problem with that is that if there if any other python thread, you're likely just going to sleep for thousands of CPU cycles as that thread will keep the GIL. Doing this implicitly for operations with such overhead would be unacceptable. I think writing 'with gil:' is fine, it's the performance that's the problem in the first place which prevents you from doing that, not the 9 characters you need to type. What I would like is having Cython infer whether the GIL is needed for a function, and mark it "implicitly nogil", so it can be called from nogil contexts without actually having to declare it nogil. This would only work for non-extern things, and you would still need to declare it nogil in your pxd if you want to export it. Apparently many users (even those that have used Cython quite a bit) are confused with what nogil on functions actually does (or they are not even aware it exists). > Stefan > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 10/26/2011 11:45 AM, mark florisson wrote: On 26 October 2011 08:56, Stefan Behnel wrote: Greg Ewing, 26.10.2011 00:27: Dag Sverre Seljebotn wrote: I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. Hmm, interesting. That would give new semantics to "nogil" sections, basically: """ You can do Python interaction in nogil code, however, this will slow down your code. Cython will generate C code to acquire and release the GIL around any Python interaction that your code performs, thus serialising any calls into the CPython runtime. If you want to avoid this serialisation, use "cython -a" to find out where Python interaction happens and use static typing to let Cython generate C code instead. """ In other words: "with gil" sections hold the GIL by default and give it away on explicit request, whereas "nogil" sections have the GIL released by default and acquire it on implicit need. The advantage over object level locking is that this does not increase the in-memory size of the object structs, and that it works with *any* Python object, not just extension types with a compile time known type. I kind of like that. My problem with that is that if there if any other python thread, you're likely just going to sleep for thousands of CPU cycles as that thread will keep the GIL. Doing this implicitly for operations with such overhead would be unacceptable. I think writing 'with gil:' is fine, it's the performance that's the problem in the first place which prevents you from doing that, not the 9 characters you need to type. You are sure about the complete impossibility of having a seperate thread doing all INCREFs and DECREFs posted to it asynchronously (in the order they are posted), without race conditions? What I would like is having Cython infer whether the GIL is needed for a function, and mark it "implicitly nogil", so it can be called from nogil contexts without actually having to declare it nogil. This would only work for non-extern things, and you would still need to declare it nogil in your pxd if you want to export it. Apparently many users (even those that have used Cython quite a bit) are confused with what nogil on functions actually does (or they are not even aware it exists). There's a long thread by me and Robert (and some of Stefan) on this from a couple of months back, don't know if you read it. You could support exports across pxds as well. Basically for *every* cdef function, export two function pointers: 1) To a wrapper to be called if you hold the GIL (outside nogil sections) 2) To a wrapper to be called if you don't hold the GIL, or don't know whether you hold the GIL (the wrapper can acquire the GIL if needed) Taking the address of a function (for passing to C, e.g.) would give you the one that can be called without holding the GIL. The implications should hopefully be getting rid of "with gil" and "nogil" on function declarations entirely. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 10/26/2011 11:45 AM, mark florisson wrote: On 26 October 2011 08:56, Stefan Behnel wrote: Greg Ewing, 26.10.2011 00:27: Dag Sverre Seljebotn wrote: I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. Hmm, interesting. That would give new semantics to "nogil" sections, basically: """ You can do Python interaction in nogil code, however, this will slow down your code. Cython will generate C code to acquire and release the GIL around any Python interaction that your code performs, thus serialising any calls into the CPython runtime. If you want to avoid this serialisation, use "cython -a" to find out where Python interaction happens and use static typing to let Cython generate C code instead. """ In other words: "with gil" sections hold the GIL by default and give it away on explicit request, whereas "nogil" sections have the GIL released by default and acquire it on implicit need. The advantage over object level locking is that this does not increase the in-memory size of the object structs, and that it works with *any* Python object, not just extension types with a compile time known type. I kind of like that. My problem with that is that if there if any other python thread, you're likely just going to sleep for thousands of CPU cycles as that thread will keep the GIL. Doing this implicitly for operations with such overhead would be unacceptable. I think writing 'with gil:' is fine, it's the performance that's the problem in the first place which prevents you from doing that, not the 9 characters you need to type. I'm with Stefan here. We have more or less the exact same problem if you inadvertendly do arithmetic with Python floats rather than C doubles. The workflow then is to check the HTML for yellow lines. Same with the GIL (we could even introduce a new color in the HTML report for where you hold the GIL and not). The advice to get fast code is But, we should also introduce directives that emit warnings in both of these situations, that you can use while developing to quickly pinpoint source code lines ("Type of variable not inferred", "GIL automatically acquired"). DS What I would like is having Cython infer whether the GIL is needed for a function, and mark it "implicitly nogil", so it can be called from nogil contexts without actually having to declare it nogil. This would only work for non-extern things, and you would still need to declare it nogil in your pxd if you want to export it. Apparently many users (even those that have used Cython quite a bit) are confused with what nogil on functions actually does (or they are not even aware it exists). Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 10/26/2011 12:29 PM, Dag Sverre Seljebotn wrote: On 10/26/2011 11:45 AM, mark florisson wrote: On 26 October 2011 08:56, Stefan Behnel wrote: Greg Ewing, 26.10.2011 00:27: Dag Sverre Seljebotn wrote: I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. Hmm, interesting. That would give new semantics to "nogil" sections, basically: """ You can do Python interaction in nogil code, however, this will slow down your code. Cython will generate C code to acquire and release the GIL around any Python interaction that your code performs, thus serialising any calls into the CPython runtime. If you want to avoid this serialisation, use "cython -a" to find out where Python interaction happens and use static typing to let Cython generate C code instead. """ In other words: "with gil" sections hold the GIL by default and give it away on explicit request, whereas "nogil" sections have the GIL released by default and acquire it on implicit need. The advantage over object level locking is that this does not increase the in-memory size of the object structs, and that it works with *any* Python object, not just extension types with a compile time known type. I kind of like that. My problem with that is that if there if any other python thread, you're likely just going to sleep for thousands of CPU cycles as that thread will keep the GIL. Doing this implicitly for operations with such overhead would be unacceptable. I think writing 'with gil:' is fine, it's the performance that's the problem in the first place which prevents you from doing that, not the 9 characters you need to type. I'm with Stefan here. We have more or less the exact same problem if you inadvertendly do arithmetic with Python floats rather than C doubles. The workflow then is to check the HTML for yellow lines. Same with the GIL (we could even introduce a new color in the HTML report for where you hold the GIL and not). The advice to get fast code is Sorry, I keep hitting post to early... "The advice to get fast code is still to 'eliminate the yellow lines'". DS But, we should also introduce directives that emit warnings in both of these situations, that you can use while developing to quickly pinpoint source code lines ("Type of variable not inferred", "GIL automatically acquired"). DS What I would like is having Cython infer whether the GIL is needed for a function, and mark it "implicitly nogil", so it can be called from nogil contexts without actually having to declare it nogil. This would only work for non-extern things, and you would still need to declare it nogil in your pxd if you want to export it. Apparently many users (even those that have used Cython quite a bit) are confused with what nogil on functions actually does (or they are not even aware it exists). Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 26 October 2011 11:23, Dag Sverre Seljebotn wrote: > On 10/26/2011 11:45 AM, mark florisson wrote: >> >> On 26 October 2011 08:56, Stefan Behnel wrote: >>> >>> Greg Ewing, 26.10.2011 00:27: Dag Sverre Seljebotn wrote: > I'd gladly take a factor two (or even four) slowdown of CPython code > any > day to get rid of the GIL :-). The thing is, sometimes one has 48 cores > and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. >>> >>> Hmm, interesting. That would give new semantics to "nogil" sections, >>> basically: >>> >>> """ >>> You can do Python interaction in nogil code, however, this will slow down >>> your code. Cython will generate C code to acquire and release the GIL >>> around >>> any Python interaction that your code performs, thus serialising any >>> calls >>> into the CPython runtime. If you want to avoid this serialisation, use >>> "cython -a" to find out where Python interaction happens and use static >>> typing to let Cython generate C code instead. >>> """ >>> >>> In other words: "with gil" sections hold the GIL by default and give it >>> away >>> on explicit request, whereas "nogil" sections have the GIL released by >>> default and acquire it on implicit need. >>> >>> The advantage over object level locking is that this does not increase >>> the >>> in-memory size of the object structs, and that it works with *any* Python >>> object, not just extension types with a compile time known type. >>> >>> I kind of like that. >> >> My problem with that is that if there if any other python thread, >> you're likely just going to sleep for thousands of CPU cycles as that >> thread will keep the GIL. Doing this implicitly for operations with >> such overhead would be unacceptable. I think writing 'with gil:' is >> fine, it's the performance that's the problem in the first place which >> prevents you from doing that, not the 9 characters you need to type. > > You are sure about the complete impossibility of having a seperate thread > doing all INCREFs and DECREFs posted to it asynchronously (in the order they > are posted), without race conditions? No I think it is possible, but I don't believe it will solve the DECREF C compiler optimization prevention problem (unlikely() should help there though) as it will still have to submit an asynchronous DECREF without races which means it has to call some kind of (synchronized or atomically operating) function (which prevented the optimization). It would be nice to have as it would mean you can pass stuff around in nogil mode without acquisition counting, and it would mean you can implement these types that can be used in nogil mode and can synchronize using their own lock (if needed). I wonder if deferring INCREFs are safe though. What if you have one reference, you INCREF (deferred, because you don't have the GIL), you call some function that steals your reference (after you obtained the GIL), you somehow cause the program to lose the stolen reference which causes it to be collected, and then the reference counter thread decides to do the INCREF (too late). You also cannot atomically INCREF, and Python doesn't do that, so there could be a race there as well. So I think you really need the GIL to INCREF, and you need to do it synchronously (I'm not completely sure, please feel free to poke holes in my logic any time :). I think it would be nicer to just fix this in CPython in any case, though. Reference counting is terrible to work with in general (regardless of whether you do them immediately or defer them), and it's part of the reason why we have a GIL (although really not the only one). As long as CPython does reference counting, removing the GIL is an absolute no-go (although I wonder how many architectures don't support atomic reference counting). Refcounting has upsides too, though. One is more deterministic collection of objects and destructor calling. Of course this argument becomes moot if you have a reference cycle somewhere. Has anyone ever attempted to implement a garbage collector for CPython? Or did everyone who wanted this feature move to PyPy? >> >> What I would like is having Cython infer whether the GIL is needed for >> a function, and mark it "implicitly nogil", so it can be called from >> nogil contexts without actually having to declare it nogil. This wo