Re: [Cython] Acquisition counted cdef classes
mark florisson, 24.10.2011 21:50: This is in response to http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f and http://trac.cython.org/cython_trac/ticket/498 , and some of the previous discussion on cython.parallel. Basically I think we should have something more powerful than 'cdef borrowed CdefClass obj', something that also doesn't rely on new syntax. We will still need borrowed reference support in the compiler eventually, whether we make it a language feature or not. What if we support acquisition counting for every instance of a cdef class? In Python and Cython GIL mode you use reference counting, and in Cython nogil mode and for structs attributes, array dtypes etc you use acquisition counting. This allows you to pass around cdef objects without the GIL and use their nogil methods. If the acquisition count is greater than 1, the acquisition count owns a reference to the object. If it reaches 0 you discard your owned reference (you can simply acquire the GIL if you don't have it) and when you increment from zero you obtain it. Perhaps something like libatomic could be used to efficiently implement this. Where would you store that count? In the object struct? That would increase the size of each instance. The advantages are: 1) allow users to pass around cdef typed objects in nogil mode 2) allow cdef typed objects in as struct attributes or array elements 3) make it easy to implement things like memoryviews (already done but would have been a lot easier), cython.parallel.async/future objects, cython.parallel.mutex objects and possibly other things in the future Would it really be easier? You can already call cdef methods in nogil mode, AFAIR. We should then allow a syntax like with mycdefobject: ... to lock the object in GIL or nogil mode (like java's 'synchronized'). For objects that already have __enter__ and __exit__ you could support something like 'with cython.synchronized(mycdefobject): ...' instead. Or perhaps you should always require cython.synchronized (or cython.parallel.synchronized). The latter, I sure hope. In addition to nogil methods a user may provide special cdef nogil methods, i.e. cdef int __len__(self) nogil: ... which would provide a Cython as well as a Python implementation for the function (with automatic cpdef behaviour), so you could use it in both contexts. That can already be done for final types, simply by adding cpdef behaviour to all special methods. That would also fix ticket #3, for example. Note that the DefNode refactoring is still pending, it would help here. There are two options for assignment semantics to a struct attribute or array element: - decref the old value (this implies always initializing the pointers to NULL first) - don't decref the old value (the user has to manually use 'del') I think 1) is more definitely consistent with how everything else works. Yes. All of this functionality should also get a sane C API (to be provided by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. Every class using this functionality is a subclass of CythonObject (that contains a PyObject + an acquisition count + a lock). Perhaps if the user is subclassing something other than object we could allow the user to specify custom __cython_(un)lock__ and __cython_acquisition_count__ methods and fields. Now, building on top of this functionality, Cython could provide built-in nogil-compatible types, like lists, dicts and maybe tuples (as a start). These will by default not lock for operations to allow e.g. one thread to iterate over the list and another thread to index it without lock contention and other general overhead. If one thread is somehow changing the size of the list, or writing to indices that another thread is reading from/writing to, the results will of course be undefined unless the user synchronizes on the object. So it would be the user's responsibility. The acquisition counting itself will always be thread-safe (i.e., it will be atomic if possible, otherwise it will lock). It's probably best to not enable this functionality by default as it would be more expensive to instantiate objects, but it could be supported through a cdef class decorator and a general directive. It's well known that this would be expensive. One of the approaches that tried to get rid of the GIL in CPython introduced fine grained locking, and it turned out to be substantially slower, AFAIR by a factor of two. You could potentially drop the locking for local variables, but you'd loose that ability as soon as the 'object' is passed into a function. Basically, what you are trying to do here is to duplicate the complete ref-counting infrastructure of CPython, but without using CPython. Of course one may still use non-cdef borrowed objects, by simply casting to a PyObject *. That's very ugly, though, because you loose all access to methods and attributes of
Re: [Cython] Acquisition counted cdef classes
On 25 October 2011 05:47, Robert Bradshaw wrote: > On Mon, Oct 24, 2011 at 2:52 PM, mark florisson > wrote: >> On 24 October 2011 22:03, Greg Ewing wrote: >>> mark florisson wrote: These will by default not lock for operations to allow e.g. one thread to iterate over the list and another thread to index it without lock contention and other general overhead. >>> >>> I don't think that's safe. You can't say "I'm not modifying >>> this, so I don't need to lock it" because there may be another >>> thread that *is* in the midst of modifying it. >> >> I was really thinking of the case where you instantiate it in Cython >> and then do some parallel work, in which case you're the only user. >> But you can't assume that in general. > > It could be useful to assert for a chunk of code that a given object > is read-only and will not be mutated for the duration of the context > (programmer error and strange crash/data corruption if it is). E.g. > > with nogil, assert_frozen(my_dict): > a = (my_dict[key]).c_attribute > [...] > > All references obtained could be borrowed. Perhaps we could even > enforce this for cdef classes (but perhaps not consistently enough, > and perhaps that would make things even more confusing). Just a > thought. Hmm, I actually think that passing around references in general (without having to declare them as borrowed in parameters) would be a good feature. If my_dict would be e.g. a cython.types.dict, then it would only accept CythonObjects, so it could just do the acquisition counting. For cython.parallel we could provide types more suited for the cython.parallel kind of fine-grained parallelism, e.g. lock for writes, don't lock for reads, which allows either to happen simultaneously, but not any mixing of those two. Through explicit or implicit barriers one may be sure that operations are correct. > - Robert > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 25 October 2011 08:33, Stefan Behnel wrote: > mark florisson, 24.10.2011 21:50: >> >> This is in response to >> >> http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f >> and http://trac.cython.org/cython_trac/ticket/498 , and some of the >> previous discussion on cython.parallel. >> >> Basically I think we should have something more powerful than 'cdef >> borrowed CdefClass obj', something that also doesn't rely on new >> syntax. > > We will still need borrowed reference support in the compiler eventually, > whether we make it a language feature or not. > I'm not sure I understand why, acquisition counting could solve these problems for cdef classes, and general objects may not be used without the GIL. Do you want this as an optimization? >> What if we support acquisition counting for every instance of a cdef >> class? In Python and Cython GIL mode you use reference counting, and >> in Cython nogil mode and for structs attributes, array dtypes etc you >> use acquisition counting. This allows you to pass around cdef objects >> without the GIL and use their nogil methods. If the acquisition count >> is greater than 1, the acquisition count owns a reference to the >> object. If it reaches 0 you discard your owned reference (you can >> simply acquire the GIL if you don't have it) and when you increment >> from zero you obtain it. Perhaps something like libatomic could be >> used to efficiently implement this. > > Where would you store that count? In the object struct? That would increase > the size of each instance. Yes, not just the count, also the lock. This feature would be optional and may be very useful for people (I think). > >> The advantages are: >> >> 1) allow users to pass around cdef typed objects in nogil mode >> 2) allow cdef typed objects in as struct attributes or array elements >> 3) make it easy to implement things like memoryviews (already done but >> would have been a lot easier), cython.parallel.async/future objects, >> cython.parallel.mutex objects and possibly other things in the future > > Would it really be easier? You can already call cdef methods in nogil mode, > AFAIR. > Sure, but you cannot store cdef objects as struct attributes, array elements (you could implement it with reference counting, but not for nogil mode), and you cannot pass them around without the GIL. This proposal is about making your life easier without the GIL, and currently it's kind of a pain. >> We should then allow a syntax like >> >> with mycdefobject: >> ... >> >> to lock the object in GIL or nogil mode (like java's 'synchronized'). >> For objects that already have __enter__ and __exit__ you could support >> something like 'with cython.synchronized(mycdefobject): ...' instead. >> Or perhaps you should always require cython.synchronized (or >> cython.parallel.synchronized). > > The latter, I sure hope. > > >> In addition to nogil methods a user may provide special cdef nogil >> methods, i.e. >> >> cdef int __len__(self) nogil: >> ... >> >> which would provide a Cython as well as a Python implementation for >> the function (with automatic cpdef behaviour), so you could use it in >> both contexts. > > That can already be done for final types, simply by adding cpdef behaviour > to all special methods. That would also fix ticket #3, for example. > > Note that the DefNode refactoring is still pending, it would help here. > Ah I assumed cpdef nogil was invalid, I see it isn't, cool. This breaks terribly for special methods though. >> There are two options for assignment semantics to a struct attribute >> or array element: >> - decref the old value (this implies always initializing the >> pointers to NULL first) >> - don't decref the old value (the user has to manually use 'del') >> >> I think 1) is more definitely consistent with how everything else works. > > Yes. > > >> All of this functionality should also get a sane C API (to be provided >> by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. >> Every class using this functionality is a subclass of CythonObject >> (that contains a PyObject + an acquisition count + a lock). Perhaps if >> the user is subclassing something other than object we could allow the >> user to specify custom __cython_(un)lock__ and >> __cython_acquisition_count__ methods and fields. >> >> Now, building on top of this functionality, Cython could provide >> built-in nogil-compatible types, like lists, dicts and maybe tuples >> (as a start). These will by default not lock for operations to allow >> e.g. one thread to iterate over the list and another thread to index >> it without lock contention and other general overhead. If one thread >> is somehow changing the size of the list, or writing to indices that >> another thread is reading from/writing to, the results will of course >> be undefined unless the user synchronizes on the object. So it would >> be the user's responsibility. The acquisition counting itsel
Re: [Cython] Acquisition counted cdef classes
mark florisson, 25.10.2011 11:11: On 25 October 2011 08:33, Stefan Behnel wrote: mark florisson, 24.10.2011 21:50: This is in response to http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f and http://trac.cython.org/cython_trac/ticket/498 , and some of the previous discussion on cython.parallel. Basically I think we should have something more powerful than 'cdef borrowed CdefClass obj', something that also doesn't rely on new syntax. We will still need borrowed reference support in the compiler eventually, whether we make it a language feature or not. I'm not sure I understand why, acquisition counting could solve these problems for cdef classes, and general objects may not be used without the GIL. Do you want this as an optimization? Yes. Think of type(x), for example, or PyDict_GetItem(). They return borrowed references, and in many cases, Cython wouldn't have to INCREF and DECREF them when they are only being used as part of some specific kinds of expressions. The same applies to some utility functions in Cython that currently must INCREF their return value unconditionally, simply because they can't tell Cython that they could also return a borrowed reference instead. If there was a way to do that, we could optimise the reference counting away in a couple of more places, which would get us another bit closer to hand-tuned code. However, note that this doesn't necessarily have an impact on nogil code. If you took a borrowed reference in one nogil thread, and a gil-holding thread deletes the object at the same time or during the lifetime of the borrowed reference (e.g. by updating a dict or assigning to a cdef attribute), the nogil thread would end up with a dead pointer in its hands. That's why the usage of borrowed references needs to be explicit in the code ("I know what I'm doing"), and the optimisations require the GIL to be held. What if we support acquisition counting for every instance of a cdef class? In Python and Cython GIL mode you use reference counting, and in Cython nogil mode and for structs attributes, array dtypes etc you use acquisition counting. This allows you to pass around cdef objects without the GIL and use their nogil methods. If the acquisition count is greater than 1, the acquisition count owns a reference to the object. If it reaches 0 you discard your owned reference (you can simply acquire the GIL if you don't have it) and when you increment from zero you obtain it. Perhaps something like libatomic could be used to efficiently implement this. Where would you store that count? In the object struct? That would increase the size of each instance. Yes, not just the count, also the lock. This feature would be optional and may be very useful for people (I think). Well, as long as it's an optional feature that requires a class decorator, the only obvious drawback is that it'll bloat the compiler even more than it is already. The advantages are: 1) allow users to pass around cdef typed objects in nogil mode 2) allow cdef typed objects in as struct attributes or array elements 3) make it easy to implement things like memoryviews (already done but would have been a lot easier), cython.parallel.async/future objects, cython.parallel.mutex objects and possibly other things in the future Would it really be easier? You can already call cdef methods in nogil mode, AFAIR. Sure, but you cannot store cdef objects as struct attributes, array elements (you could implement it with reference counting, but not for nogil mode) You could do that with borrowed references, though, assuming that you keep another reference around (or do your own ref-counting). However, I do see that keeping a real reference around may be hard to do in some cases. and you cannot pass them around without the GIL. Yes, you can, as long as you only go through cdef functions. Obviously, you can't pass them into a Python function call, but you can (and could, if it was implemented) do loads of useful things with existing references even in nogil sections. The GIL checker is quite fine grained already but could do even better. This proposal is about making your life easier without the GIL, and currently it's kind of a pain. The nogil sections I use are usually quite short, so I can't tell. It's certainly a pain to work without the GIL, because it means you have to take a lot more care when writing your code. But that won't change just by dropping reference counting. And nogil code will definitely become another bit harder to get right when using borrowed references. Ah I assumed cpdef nogil was invalid, I see it isn't, cool. It makes perfect sense. Just because a function *can* be called without the GIL doesn't mean it can't be called from Python. So the Python wrapper requires the GIL, but the underlying cdef function doesn't. This breaks terribly for special methods though. Why? It's just a matter of properl
Re: [Cython] Acquisition counted cdef classes
On 10/25/2011 09:33 AM, Stefan Behnel wrote: mark florisson, 24.10.2011 21:50: This is in response to http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f and http://trac.cython.org/cython_trac/ticket/498 , and some of the previous discussion on cython.parallel. Basically I think we should have something more powerful than 'cdef borrowed CdefClass obj', something that also doesn't rely on new syntax. We will still need borrowed reference support in the compiler eventually, whether we make it a language feature or not. What if we support acquisition counting for every instance of a cdef class? In Python and Cython GIL mode you use reference counting, and in Cython nogil mode and for structs attributes, array dtypes etc you use acquisition counting. This allows you to pass around cdef objects without the GIL and use their nogil methods. If the acquisition count is greater than 1, the acquisition count owns a reference to the object. If it reaches 0 you discard your owned reference (you can simply acquire the GIL if you don't have it) and when you increment from zero you obtain it. Perhaps something like libatomic could be used to efficiently implement this. Where would you store that count? In the object struct? That would increase the size of each instance. The advantages are: 1) allow users to pass around cdef typed objects in nogil mode 2) allow cdef typed objects in as struct attributes or array elements 3) make it easy to implement things like memoryviews (already done but would have been a lot easier), cython.parallel.async/future objects, cython.parallel.mutex objects and possibly other things in the future Would it really be easier? You can already call cdef methods in nogil mode, AFAIR. We should then allow a syntax like with mycdefobject: ... to lock the object in GIL or nogil mode (like java's 'synchronized'). For objects that already have __enter__ and __exit__ you could support something like 'with cython.synchronized(mycdefobject): ...' instead. Or perhaps you should always require cython.synchronized (or cython.parallel.synchronized). The latter, I sure hope. In addition to nogil methods a user may provide special cdef nogil methods, i.e. cdef int __len__(self) nogil: ... which would provide a Cython as well as a Python implementation for the function (with automatic cpdef behaviour), so you could use it in both contexts. That can already be done for final types, simply by adding cpdef behaviour to all special methods. That would also fix ticket #3, for example. Note that the DefNode refactoring is still pending, it would help here. There are two options for assignment semantics to a struct attribute or array element: - decref the old value (this implies always initializing the pointers to NULL first) - don't decref the old value (the user has to manually use 'del') I think 1) is more definitely consistent with how everything else works. Yes. All of this functionality should also get a sane C API (to be provided by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. Every class using this functionality is a subclass of CythonObject (that contains a PyObject + an acquisition count + a lock). Perhaps if the user is subclassing something other than object we could allow the user to specify custom __cython_(un)lock__ and __cython_acquisition_count__ methods and fields. Now, building on top of this functionality, Cython could provide built-in nogil-compatible types, like lists, dicts and maybe tuples (as a start). These will by default not lock for operations to allow e.g. one thread to iterate over the list and another thread to index it without lock contention and other general overhead. If one thread is somehow changing the size of the list, or writing to indices that another thread is reading from/writing to, the results will of course be undefined unless the user synchronizes on the object. So it would be the user's responsibility. The acquisition counting itself will always be thread-safe (i.e., it will be atomic if possible, otherwise it will lock). It's probably best to not enable this functionality by default as it would be more expensive to instantiate objects, but it could be supported through a cdef class decorator and a general directive. It's well known that this would be expensive. One of the approaches that tried to get rid of the GIL in CPython introduced fine grained locking, and it turned out to be substantially slower, AFAIR by a factor of two. I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
Dag Sverre Seljebotn, 25.10.2011 15:28: On 10/25/2011 09:33 AM, Stefan Behnel wrote: mark florisson, 24.10.2011 21:50: All of this functionality should also get a sane C API (to be provided by cython.h). You'd get a Cy_INCREF(obj, have_gil)/Cy_DECREF() etc. Every class using this functionality is a subclass of CythonObject (that contains a PyObject + an acquisition count + a lock). Perhaps if the user is subclassing something other than object we could allow the user to specify custom __cython_(un)lock__ and __cython_acquisition_count__ methods and fields. Now, building on top of this functionality, Cython could provide built-in nogil-compatible types, like lists, dicts and maybe tuples (as a start). These will by default not lock for operations to allow e.g. one thread to iterate over the list and another thread to index it without lock contention and other general overhead. If one thread is somehow changing the size of the list, or writing to indices that another thread is reading from/writing to, the results will of course be undefined unless the user synchronizes on the object. So it would be the user's responsibility. The acquisition counting itself will always be thread-safe (i.e., it will be atomic if possible, otherwise it will lock). It's probably best to not enable this functionality by default as it would be more expensive to instantiate objects, but it could be supported through a cdef class decorator and a general directive. It's well known that this would be expensive. One of the approaches that tried to get rid of the GIL in CPython introduced fine grained locking, and it turned out to be substantially slower, AFAIR by a factor of two. I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Ah, sorry, that factor was for single-threaded code. How it would scale for multi-core code depends on too many factors to make any general statement. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 25 October 2011 12:22, Stefan Behnel wrote: > mark florisson, 25.10.2011 11:11: >> >> On 25 October 2011 08:33, Stefan Behnel wrote: >>> >>> mark florisson, 24.10.2011 21:50: This is in response to http://groups.google.com/group/cython-users/browse_thread/thread/bcbc5fe0e329224f and http://trac.cython.org/cython_trac/ticket/498 , and some of the previous discussion on cython.parallel. Basically I think we should have something more powerful than 'cdef borrowed CdefClass obj', something that also doesn't rely on new syntax. >>> >>> We will still need borrowed reference support in the compiler eventually, >>> whether we make it a language feature or not. >> >> I'm not sure I understand why, acquisition counting could solve these >> problems for cdef classes, and general objects may not be used without >> the GIL. Do you want this as an optimization? > > Yes. Think of type(x), for example, or PyDict_GetItem(). They return > borrowed references, and in many cases, Cython wouldn't have to INCREF and > DECREF them when they are only being used as part of some specific kinds of > expressions. The same applies to some utility functions in Cython that > currently must INCREF their return value unconditionally, simply because > they can't tell Cython that they could also return a borrowed reference > instead. If there was a way to do that, we could optimise the reference > counting away in a couple of more places, which would get us another bit > closer to hand-tuned code. > > However, note that this doesn't necessarily have an impact on nogil code. If > you took a borrowed reference in one nogil thread, and a gil-holding thread > deletes the object at the same time or during the lifetime of the borrowed > reference (e.g. by updating a dict or assigning to a cdef attribute), the > nogil thread would end up with a dead pointer in its hands. That's why the > usage of borrowed references needs to be explicit in the code ("I know what > I'm doing"), and the optimisations require the GIL to be held. > I see, ok. Thanks, that really helped me see the motivation behind it (i.e., the INC/DECREF really is a performance issue for you). What if we support acquisition counting for every instance of a cdef class? In Python and Cython GIL mode you use reference counting, and in Cython nogil mode and for structs attributes, array dtypes etc you use acquisition counting. This allows you to pass around cdef objects without the GIL and use their nogil methods. If the acquisition count is greater than 1, the acquisition count owns a reference to the object. If it reaches 0 you discard your owned reference (you can simply acquire the GIL if you don't have it) and when you increment from zero you obtain it. Perhaps something like libatomic could be used to efficiently implement this. >>> >>> Where would you store that count? In the object struct? That would >>> increase >>> the size of each instance. >> >> Yes, not just the count, also the lock. This feature would be optional >> and may be very useful for people (I think). > > Well, as long as it's an optional feature that requires a class decorator, > the only obvious drawback is that it'll bloat the compiler even more than it > is already. > Actually, I think it will help the implementation of mutexes and async objects if we want those, and possibly other stuff in the future. The acquisition counting is basically already there (for memoryviews), so it's easy to track down where and when to apply this. However one major problem would be circular acquisition counts, so you'd also have to implement a garbage collector like CPython has (e.g. if you have a cdef class with a cython.parallel.dict). We should just have a real garbage collector instead of all the counting crap. Or we could make it a burden for the user... I agree that this is really not as feasible as I first thought. It actually shows me a problem where I can have a memoryview object in a memoryview with dtype 'object', although the problem here is that the memoryview object doesn't traverse the object in the Py_buffer, or when coerced from a memoryview slice to a memoryview object, the memoryview slice struct object... I suppose I need to fix that (but I'm not sure how, as you can't provide a manual traverse function in Cython). But I really believe that these are much-wanted features. If you're using threads in Python you can only get concurrency not parallelism, unless you release the GIL, even if there is some performance overhead it will still be a lot better than sequential execution. Perhaps when cython.parallel will be more mature, we may get functionality to specify data distribution schemes and message passing, in which case the GIL won't be a problem. But many things would be harder or much more expensive, e.g. transposing, sending objects etc. I think I'll just drop this discussion for now. I'm going
Re: [Cython] Acquisition counted cdef classes
mark florisson, 25.10.2011 18:58: On 25 October 2011 12:22, Stefan Behnel wrote: mark florisson, 25.10.2011 11:11: On 25 October 2011 08:33, Stefan Behnel wrote: mark florisson, 24.10.2011 21:50: What if we support acquisition counting for every instance of a cdef class? In Python and Cython GIL mode you use reference counting, and in Cython nogil mode and for structs attributes, array dtypes etc you use acquisition counting. This allows you to pass around cdef objects without the GIL and use their nogil methods. If the acquisition count is greater than 1, the acquisition count owns a reference to the object. If it reaches 0 you discard your owned reference (you can simply acquire the GIL if you don't have it) and when you increment from zero you obtain it. Perhaps something like libatomic could be used to efficiently implement this. Where would you store that count? In the object struct? That would increase the size of each instance. Yes, not just the count, also the lock. This feature would be optional and may be very useful for people (I think). Well, as long as it's an optional feature that requires a class decorator, the only obvious drawback is that it'll bloat the compiler even more than it is already. Actually, I think it will help the implementation of mutexes and async objects if we want those, and possibly other stuff in the future. If all you want is to support the regular with statement in nogil blocks, part of that is implemented already. I recently added support for implementing the context manager's __enter__() method as c(p)def method. However, __exit__() isn't there yet, as it's a bit more tricky - maybe taking off a C pointer to the cdef method and calling that, or calling the cdef method directly instead (not sure), but always making sure that there still is a reference to the context manager itself, and eventually freeing it. I'm sure it can be done, though, maybe with some restrictions in nogil mode. If we additionally fix it up to use the exception propagation and try-finally support that you wrote for the with-gil feature, we're basically there. The acquisition counting is basically already there (for memoryviews), so it's easy to track down where and when to apply this. However one major problem would be circular acquisition counts, so you'd also have to implement a garbage collector like CPython has (e.g. if you have a cdef class with a cython.parallel.dict). We should just have a real garbage collector instead of all the counting crap. Or we could make it a burden for the user... Right, these things can grow endlessly. It took CPython something like a dozen years to a) recognise the need for and b) implement a garbage collector. Let's hope that Cython will never get one. I agree that this is really not as feasible as I first thought. It actually shows me a problem where I can have a memoryview object in a memoryview with dtype 'object', although the problem here is that the memoryview object doesn't traverse the object in the Py_buffer, or when coerced from a memoryview slice to a memoryview object, the memoryview slice struct object... I suppose I need to fix that (but I'm not sure how, as you can't provide a manual traverse function in Cython). No, you may have to descend into C here. Or, you could disable a Python object dtype for the time being? But I really believe that these are much-wanted features. If you're using threads in Python you can only get concurrency not parallelism, unless you release the GIL, even if there is some performance overhead it will still be a lot better than sequential execution. Perhaps when cython.parallel will be more mature, we may get functionality to specify data distribution schemes and message passing, in which case the GIL won't be a problem. But many things would be harder or much more expensive, e.g. transposing, sending objects etc. See? That's what I mean with language complexity. These things quickly turn into an open can of worms. I don't think the language should handle any of these. Message passing is up to libraries, for example. If you want language support, use Erlang. The advantages are: 1) allow users to pass around cdef typed objects in nogil mode 2) allow cdef typed objects in as struct attributes or array elements 3) make it easy to implement things like memoryviews (already done but would have been a lot easier), cython.parallel.async/future objects, cython.parallel.mutex objects and possibly other things in the future Would it really be easier? You can already call cdef methods in nogil mode, AFAIR. Sure, but you cannot store cdef objects as struct attributes, array elements (you could implement it with reference counting, but not for nogil mode) You could do that with borrowed references, though, assuming that you keep another reference around (or do your own ref-counting). However, I do see that keeping a real reference around may be hard to do in some cases.
Re: [Cython] Acquisition counted cdef classes
On 25 October 2011 19:10, Stefan Behnel wrote: > mark florisson, 25.10.2011 18:58: >> >> On 25 October 2011 12:22, Stefan Behnel wrote: >>> >>> mark florisson, 25.10.2011 11:11: On 25 October 2011 08:33, Stefan Behnel wrote: > > mark florisson, 24.10.2011 21:50: >> >> What if we support acquisition counting for every instance of a cdef >> class? In Python and Cython GIL mode you use reference counting, and >> in Cython nogil mode and for structs attributes, array dtypes etc you >> use acquisition counting. This allows you to pass around cdef objects >> without the GIL and use their nogil methods. If the acquisition count >> is greater than 1, the acquisition count owns a reference to the >> object. If it reaches 0 you discard your owned reference (you can >> simply acquire the GIL if you don't have it) and when you increment >> from zero you obtain it. Perhaps something like libatomic could be >> used to efficiently implement this. > > Where would you store that count? In the object struct? That would > increase the size of each instance. Yes, not just the count, also the lock. This feature would be optional and may be very useful for people (I think). >>> >>> Well, as long as it's an optional feature that requires a class >>> decorator, >>> the only obvious drawback is that it'll bloat the compiler even more than >>> it >>> is already. >> >> Actually, I think it will help the implementation of mutexes and async >> objects if we want those, and possibly other stuff in the future. > > If all you want is to support the regular with statement in nogil blocks, > part of that is implemented already. I recently added support for > implementing the context manager's __enter__() method as c(p)def method. > However, __exit__() isn't there yet, as it's a bit more tricky - maybe > taking off a C pointer to the cdef method and calling that, or calling the > cdef method directly instead (not sure), but always making sure that there > still is a reference to the context manager itself, and eventually freeing > it. I'm sure it can be done, though, maybe with some restrictions in nogil > mode. If we additionally fix it up to use the exception propagation and > try-finally support that you wrote for the with-gil feature, we're basically > there. > Cool. I suppose if you combine that with borrowed references you may just get somewhere implementing the mutexes. On the other hand it won't really be more convenient than passing OpenMP or Python locks around, just slightly more pythonic. >> The >> acquisition counting is basically already there (for memoryviews), so >> it's easy to track down where and when to apply this. However one >> major problem would be circular acquisition counts, so you'd also have >> to implement a garbage collector like CPython has (e.g. if you have a >> cdef class with a cython.parallel.dict). We should just have a real >> garbage collector instead of all the counting crap. Or we could make >> it a burden for the user... > > Right, these things can grow endlessly. It took CPython something like a > dozen years to a) recognise the need for and b) implement a garbage > collector. Let's hope that Cython will never get one. > > >> I agree that this is really not as feasible as I first thought. It >> actually shows me a problem where I can have a memoryview object in a >> memoryview with dtype 'object', although the problem here is that the >> memoryview object doesn't traverse the object in the Py_buffer, or >> when coerced from a memoryview slice to a memoryview object, the >> memoryview slice struct object... I suppose I need to fix that (but >> I'm not sure how, as you can't provide a manual traverse function in >> Cython). > > No, you may have to descend into C here. Or, you could disable a Python > object dtype for the time being? > Yes disabling would be easy, but it should be fixed (at some point). Perhaps I can just override the tp_traverse of the type object in the module init function (and maybe save that pointer and call it from the new function + traverse the Py_buffer). I'm not entire sure how we support Py_buffer, but it is a built-in thing and it doesn't result in a traverse: cdef class X(object): cdef Py_buffer view <- this won't have a traverse function. Fixing that won't get me there though, I need to do the same thing for memoryview objects wrapping a memoryview struct. >> But I really believe that these are much-wanted features. If you're >> using threads in Python you can only get concurrency not parallelism, >> unless you release the GIL, even if there is some performance overhead >> it will still be a lot better than sequential execution. Perhaps when >> cython.parallel will be more mature, we may get functionality to >> specify data distribution schemes and message passing, in which case >> the GIL won't be a problem. But many things would be harder or much >> more expensive, e.g.
Re: [Cython] Acquisition counted cdef classes
On 10/25/2011 06:58 PM, mark florisson wrote: On 25 October 2011 12:22, Stefan Behnel wrote: The problem is not so much the INCREF (which is just an indirect add), it's the DECREF, which contains a conditional jump based on an unknown external value, that may trigger external code. That can kill several C compiler optimisations for the surrounding code. (And that would only get worse by using a dedicated locking mechanism.) What you could do is a form of psuedo-garbage-collection where, when the Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF until you're holding the GIL anyway. If sticking it into the queue is unlikely(), and it is transparent to the compiler that it doesn't dispatch into unknown code. (And regarding Stefan's comment about Erlang: It's all about available libraries. A language for concurrent computing running on CPython and able to use all the libraries available for CPython would be awesome. It doesn't need to be named Cython -- show me an Erlang port to the CPython platform and I'd perhaps jump ship.) Anyway, sorry for the long mail. I agree this is likely not feasible to implement, although I would like the functionality to be there. Perhaps I'm trying to solve problems which don't really need to be solved. Maybe we should just use multiprocessing, or MPI and numpy with global arrays and pickling. Maybe memoryviews could help out with that as well. Nice conclusion. I think prange was a very nice 80%-there-solution (which is also the way we framed it when starting), but the GIL just creates to many barriers. Real garbage collection is needed, and CPython just isn't there. What I'd like to see personally is: - A convenient utility to allocate an array in shared memory, so that when you pickle a view of it and send it to another Python process with multiprocessing and it unpickles, it gets a slice into to the same shared memory. People already do this but it's just a lot of jumping through hoops. A good place would probably be in NumPy. - Decent message passing using ZeroMQ in Cython code without any Python overhead, for fine-grained communication in Cython code in Python processes spawned using multiprocessing. I think this requires some syntax candy in Cython to feel natural enough, but perhaps it can be put on a form so that it is not ZeroMQ-specific. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 10/25/2011 09:01 PM, Dag Sverre Seljebotn wrote: On 10/25/2011 06:58 PM, mark florisson wrote: On 25 October 2011 12:22, Stefan Behnel wrote: The problem is not so much the INCREF (which is just an indirect add), it's the DECREF, which contains a conditional jump based on an unknown external value, that may trigger external code. That can kill several C compiler optimisations for the surrounding code. (And that would only get worse by using a dedicated locking mechanism.) What you could do is a form of psuedo-garbage-collection where, when the Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF until you're holding the GIL anyway. If sticking it into the queue is unlikely(), and it is transparent to the compiler that it doesn't dispatch into unknown code. ...then the C compiler optimizations should presumably not be killed. DS (And regarding Stefan's comment about Erlang: It's all about available libraries. A language for concurrent computing running on CPython and able to use all the libraries available for CPython would be awesome. It doesn't need to be named Cython -- show me an Erlang port to the CPython platform and I'd perhaps jump ship.) Anyway, sorry for the long mail. I agree this is likely not feasible to implement, although I would like the functionality to be there. Perhaps I'm trying to solve problems which don't really need to be solved. Maybe we should just use multiprocessing, or MPI and numpy with global arrays and pickling. Maybe memoryviews could help out with that as well. Nice conclusion. I think prange was a very nice 80%-there-solution (which is also the way we framed it when starting), but the GIL just creates to many barriers. Real garbage collection is needed, and CPython just isn't there. What I'd like to see personally is: - A convenient utility to allocate an array in shared memory, so that when you pickle a view of it and send it to another Python process with multiprocessing and it unpickles, it gets a slice into to the same shared memory. People already do this but it's just a lot of jumping through hoops. A good place would probably be in NumPy. - Decent message passing using ZeroMQ in Cython code without any Python overhead, for fine-grained communication in Cython code in Python processes spawned using multiprocessing. I think this requires some syntax candy in Cython to feel natural enough, but perhaps it can be put on a form so that it is not ZeroMQ-specific. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 10/25/2011 08:45 PM, mark florisson wrote: On 25 October 2011 19:10, Stefan Behnel wrote: See? That's what I mean with language complexity. These things quickly turn into an open can of worms. I don't think the language should handle any of these. Message passing is up to libraries, for example. If you want language support, use Erlang. I haven't used Erlang (though I should give it a go), but I find that built-in support for these things just ends up to be much more elegant. MPI (and possibly zeromq) just look terrible and complicated if you compare them to Unified Parallel C, High Performance Fortran or Using libraries for message passing is sort of like doing complex string manipulation only using malloc, free, and string.h :-) Co-Array Fortran. I don't know about Go channels. This doesn't mean that we should support it, but we might consider it. I think you should definitely read up on Go channels, they're just like what I'd like to write in Cython. Dag Sverre ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 25 October 2011 20:01, Dag Sverre Seljebotn wrote: > On 10/25/2011 06:58 PM, mark florisson wrote: >> >> On 25 October 2011 12:22, Stefan Behnel wrote: >>> >>> The problem is not so much the INCREF (which is just an indirect add), >>> it's >>> the DECREF, which contains a conditional jump based on an unknown >>> external >>> value, that may trigger external code. That can kill several C compiler >>> optimisations for the surrounding code. (And that would only get worse by >>> using a dedicated locking mechanism.) > > What you could do is a form of psuedo-garbage-collection where, when the > Cython refcount/acquisition count reaches 0, you enqueue a Python DECREF > until you're holding the GIL anyway. If sticking it into the queue is > unlikely(), and it is transparent to the compiler that it doesn't dispatch > into unknown code. I thought about that as wel, but the problem is that you can only defer the DECREF to a garbage collector if your acquisition count reaches zero and your reference count is one. However, you may reach an acquisition count of zero with a reference count > 1, which means you could have the following race: 1) acquisition count reaches zero, a DECREF is pending in the garbage collector thread 2) you obtain a nonzero acquisition count from the object (e.g. by assigning a non-typed to a typed variable) 3) you lose your acquisition count again, another DECREF should be pending 4) the garbage collector figures out it needs to DECREF (it should actually do this twice) Now, you could keep a counter for how many times that happens, but that will likely not be better than an immediate DECREF. In short, reference counting is terrible. I think unlikely() will help the compiler here as you said though, and your processor will have branch prediction, out of order execution and conditional instructions which may all help. > (And regarding Stefan's comment about Erlang: It's all about available > libraries. A language for concurrent computing running on CPython and able > to use all the libraries available for CPython would be awesome. It doesn't > need to be named Cython -- show me an Erlang port to the CPython platform > and I'd perhaps jump ship.) > > >> Anyway, sorry for the long mail. I agree this is likely not feasible >> to implement, although I would like the functionality to be there. >> Perhaps I'm trying to solve problems which don't really need to be >> solved. Maybe we should just use multiprocessing, or MPI and numpy >> with global arrays and pickling. Maybe memoryviews could help out with >> that as well. > > Nice conclusion. I think prange was a very nice 80%-there-solution (which is > also the way we framed it when starting), but the GIL just creates to many > barriers. Real garbage collection is needed, and CPython just isn't there. > > What I'd like to see personally is: > > - A convenient utility to allocate an array in shared memory, so that when > you pickle a view of it and send it to another Python process with > multiprocessing and it unpickles, it gets a slice into to the same shared > memory. People already do this but it's just a lot of jumping through hoops. > A good place would probably be in NumPy. I haven't used it myself, but can the global array support help in that regard? > - Decent message passing using ZeroMQ in Cython code without any Python > overhead, for fine-grained communication in Cython code in Python processes > spawned using multiprocessing. I think this requires some syntax candy in > Cython to feel natural enough, but perhaps it can be put on a form so that > it is not ZeroMQ-specific. > > Dag Sverre > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
On 25 October 2011 20:15, Dag Sverre Seljebotn wrote: > On 10/25/2011 08:45 PM, mark florisson wrote: >> >> On 25 October 2011 19:10, Stefan Behnel wrote: >>> >>> See? That's what I mean with language complexity. These things quickly >>> turn >>> into an open can of worms. I don't think the language should handle any >>> of >>> these. Message passing is up to libraries, for example. If you want >>> language >>> support, use Erlang. >>> >> >> I haven't used Erlang (though I should give it a go), but I find that >> built-in support for these things just ends up to be much more >> elegant. MPI (and possibly zeromq) just look terrible and complicated >> if you compare them to Unified Parallel C, High Performance Fortran or > > Using libraries for message passing is sort of like doing complex string > manipulation only using malloc, free, and string.h :-) > >> Co-Array Fortran. I don't know about Go channels. This doesn't mean >> that we should support it, but we might consider it. > > I think you should definitely read up on Go channels, they're just like what > I'd like to write in Cython. That's a good motivator :) I'll do that. > Dag Sverre > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Acquisition counted cdef classes
Dag Sverre Seljebotn wrote: I'd gladly take a factor two (or even four) slowdown of CPython code any day to get rid of the GIL :-). The thing is, sometimes one has 48 cores and consider a 10x speedup better than nothing... Another thing to consider is that locking around refcount changes may not be as expensive in typical Cython code as it is in Python. The trouble with Python is that you can't so much as scratch your nose without touching a big pile of ref counts. But if the Cython code is only dealing with a few Python objects and doing most of its work at the C level, the relative overhead of locking around refcount changes may not be significant. So it may be worth trying the strategy of just acquiring the GIL whenever a refcount needs to be changed in a nogil section, and damn the consequences. -- Greg ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel