[Cython] Problem with Py_buffer struct definition in Builtin.py
Hi, the builtin Py_buffer struct type is currently defined as follows: """ builtin_structs_table = [ ('Py_buffer', 'Py_buffer', [("buf",PyrexTypes.c_void_ptr_type), ("obj",PyrexTypes.py_object_type), ("len",PyrexTypes.c_py_ssize_t_type), ... """ I hadn't noticed this before, but when you actually use it in a "__getbuffer__()" special method, you have to assign the buffer owner (i.e. self) to the .obj field, which is currently defined as "object". Meaning, Cython will do reference counting for it. That's fine for the self reference being assigned to it. However, when "__getbuffer__()" gets called, .obj is not assigned, i.e. it may be NULL, or it may contain garbage, depending on the caller. So, Cython will generate broken code for it when trying to DECREF the original value as part of the assignment. """ * buffer.obj = self # << */ __Pyx_INCREF(((PyObject *)__pyx_v_self)); __Pyx_GIVEREF(((PyObject *)__pyx_v_self)); __Pyx_GOTREF(__pyx_v_buffer->obj); __Pyx_DECREF(__pyx_v_buffer->obj); // please crash for me __pyx_v_buffer->obj = ((PyObject *)__pyx_v_self); """ I see a couple of ways to fix this: 1) Change the field to PyObject* instead of object (and finally make PyObject a known type in Cython). Problem: may break existing code, although that's currently broken anyway. Also makes it more annoying for users who then have to do the manual cast+INCREF dance. 2) Special case this field and do not do any reference counting for its current value. PyBuffer_Release() will properly do the XDECREF cleanup anyway, and it's unlikely that user code will assign to it more than once. Problem: very unexpected behaviour for users in case they ever *do* multiple assignments. 3) Wrap the user provided "__getbuffer__()" special method in code that always assigns "self" to the .obj field before entering the user code and also clears it on error return. That would mean that it will always contain a valid value, so that ref-counting will work as expected. Problem: this complicates the implementation and may not be obvious for users (who don't read the documentation). It also won't fix any code that uses Py_buffer outside of "__getbuffer__()" or that calls "__releasebuffer__()" directly without going through PyBuffer_Release(). However, this actually makes it easier for users, who can then even skip setting the value in all but some very rare cases and just care about the actual buffer setup in their "__getbuffer__()" code. I'm leaning towards 3). What do you think? I also think that this is worth fixing for 0.16 if possible, because it currently generates crash code from innocent and correct looking user code. Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Problem with Py_buffer struct definition in Builtin.py
Stefan Behnel, 02.03.2012 10:45: > the builtin Py_buffer struct type is currently defined as follows: > > """ > builtin_structs_table = [ > ('Py_buffer', 'Py_buffer', > [("buf",PyrexTypes.c_void_ptr_type), > ("obj",PyrexTypes.py_object_type), > ("len",PyrexTypes.c_py_ssize_t_type), > ... > """ > > I hadn't noticed this before, but when you actually use it in a > "__getbuffer__()" special method, you have to assign the buffer owner (i.e. > self) to the .obj field, which is currently defined as "object". Oh, well, I should really learn to read code before composing a lengthy e-mail... "__getbuffer__()" is already special-cased and sets the value to None. I think I even recall that we discussed this back when Dag implemented support for buffers. The reason I had originally noticed this was this recent change in Py3.3: """ static PyObject * _PyManagedBuffer_FromObject(PyObject *base) { _PyManagedBufferObject *mbuf; mbuf = mbuf_alloc(); if (mbuf == NULL) return NULL; if (PyObject_GetBuffer(base, &mbuf->master, PyBUF_FULL_RO) < 0) { /* mbuf->master.obj must be NULL. */ Py_DECREF(mbuf); return NULL; } /* Assume that master.obj is a new reference to base. */ assert(mbuf->master.obj == base); return (PyObject *)mbuf; } """ Note the assertion (which is unreleased as of now, so it may still be subject to changes). Is there any reason the value should be set to None by Cython's special casing code instead of self? Stefan ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 and ndarray fields deprecation
On 1 March 2012 16:18, Dag Sverre Seljebotn wrote: > On 03/01/2012 04:03 AM, mark florisson wrote: >> >> On 29 February 2012 17:57, Dag Sverre Seljebotn >> wrote: >>> >>> On 02/29/2012 09:42 AM, Stefan Behnel wrote: Dag Sverre Seljebotn, 29.02.2012 18:06: > > > I'm wondering what the best course of action for deprecating the shape > field in numpy.pxd is. > > The thing is, currently "shape" really gets in the way. In most > situations > it is OK with slow access to shape through the Python layer, and > "arr.shape[0]" is often just fine, but currently one is in a situation > where one must either write "(arr).shape[0])" or > "np.PyArray_DIMS(arr)[0]", or be faced with code that isn't > forward-compatible with NumPy. Can Cython emulate this at the C layer? And even your work-around for the Python object access looks more like a Cython bug to me. I wouldn't know why that can't "just work". It usually works for other undeclared Python attributes of "anything", so it might just as well be made to work here. >>> >>> >>> >>> Well, the problem is that shape is currently declared as a C field. It is >>> also available as a Python attribute. Usually the user doesn't care which >>> one is used, but the C field is declared for the few cases where access >>> is >>> speed-critical. >>> >>> Though even with current NumPy, I find myself doing "print >>> (arr).shape" in order to get a tuple rather than a Py_ssize_t*... >>> >>> > It would really be good to do the transition as fast as possible, so > that > all Cython code eventually becomes ready for upcoming NumPy releases. But it previously worked, right? It's just no longer supported in newer NumPy versions IIUC? If that's the case, deleting it would break otherwise working code. No-one forces you to switch to the latest NumPy version, after all, and certainly not right now. A warning is much better. >>> >>> >>> >>> It previously worked, but it turns out that it was always frowned-upon. I >>> didn't know that when I added the fields, and it was a convenient way of >>> speeding things up... >> >> >> I would personally prefer either cdef nogil extension class properties >> (needs compiler support) or just special-casing in the compiler, which >> shouldn't be too hard I think. Warnings would be a first step, but the >> linkage of ndarray attributes to C attributes is really an >> implementation detail, so it's better to keep supporting the >> attributes correctly. > > > So you are saying we (somehow) stick with supporting "arr.shape[0]" in the > future, and perhaps even support "print arr.shape"? (+ arr.dim, > arr.strides). Exactly how we could figure out at PyCon. To remain consistent with previous versions the former should be supported and the latter would be a bonus (and it wouldn't be too hard anyway). > I'm anyway leaning towards deprecating arr.data, as it's too different from > what the Python attribute does. +1 for that, just write &arr[0] as Sturla mentioned. The transition should be trivial. > Reason I'm asking is that I'm giving a talk on Saturday, and I don't want to > teach people bad habits -- so we must figure out what the bad habits are :-) > (I think this applies for the PyCon poster as well...) > > [1] PyData workshop at Google's offices in Mountain View; the event was open > for all but now it is full with a long waiting list, which is why I didn't > announce it. http://pydataworkshop.eventbrite.com/ > > > Dag > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Problem with Py_buffer struct definition in Builtin.py
On 2 March 2012 10:09, Stefan Behnel wrote: > Stefan Behnel, 02.03.2012 10:45: >> the builtin Py_buffer struct type is currently defined as follows: >> >> """ >> builtin_structs_table = [ >> ('Py_buffer', 'Py_buffer', >> [("buf", PyrexTypes.c_void_ptr_type), >> ("obj", PyrexTypes.py_object_type), >> ("len", PyrexTypes.c_py_ssize_t_type), >> ... >> """ >> >> I hadn't noticed this before, but when you actually use it in a >> "__getbuffer__()" special method, you have to assign the buffer owner (i.e. >> self) to the .obj field, which is currently defined as "object". > > Oh, well, I should really learn to read code before composing a lengthy > e-mail... > > "__getbuffer__()" is already special-cased and sets the value to None. I > think I even recall that we discussed this back when Dag implemented > support for buffers. The reason I had originally noticed this was this > recent change in Py3.3: > > """ > static PyObject * > _PyManagedBuffer_FromObject(PyObject *base) > { > _PyManagedBufferObject *mbuf; > > mbuf = mbuf_alloc(); > if (mbuf == NULL) > return NULL; > > if (PyObject_GetBuffer(base, &mbuf->master, PyBUF_FULL_RO) < 0) { > /* mbuf->master.obj must be NULL. */ > Py_DECREF(mbuf); > return NULL; > } > > /* Assume that master.obj is a new reference to base. */ > assert(mbuf->master.obj == base); > > return (PyObject *)mbuf; > } > """ > > Note the assertion (which is unreleased as of now, so it may still be > subject to changes). Is there any reason the value should be set to None by > Cython's special casing code instead of self? It sets it to None to it can later reset it to NULL. Python 3.3 is the first version to document the 'obj' Py_buffer attribute, and it mentions that the exporter may set it to NULL. The code above is not the PyObject_GetBuffer function, so setting it to NULL should still work. Defaulting it to 'self' instead would work equally well I'd think. > Stefan > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 and ndarray fields deprecation
On 1 March 2012 19:16, Sturla Molden wrote: > On 01.03.2012 19:33, Dag Sverre Seljebotn wrote: >> >> >> Yeah, I proposed this on another thread as one of the options, but the >> support wasn't overwhelming at the time... > > > I think it is worse to break parts of it, thus introducing bugs that might > go silent for a long time. The point is that we would remain fully backwards compatible (except for the data attribute perhaps) and support the attributes portably across numpy versions, nothing would be broken or silent. > Rather deprecate the whole ndarray interface. > As much as I would like that, you can't just break everyone's code in a new release. I think the syntax should be removed several versions later, or maybe even wait until Cython 1.0. > Sturla > > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Cython 0.16 and ndarray fields deprecation
On Fri, Mar 2, 2012 at 8:29 AM, mark florisson wrote: > On 1 March 2012 16:18, Dag Sverre Seljebotn > wrote: >> On 03/01/2012 04:03 AM, mark florisson wrote: >>> >>> On 29 February 2012 17:57, Dag Sverre Seljebotn >>> wrote: On 02/29/2012 09:42 AM, Stefan Behnel wrote: > > > Dag Sverre Seljebotn, 29.02.2012 18:06: >> >> >> I'm wondering what the best course of action for deprecating the shape >> field in numpy.pxd is. >> >> The thing is, currently "shape" really gets in the way. In most >> situations >> it is OK with slow access to shape through the Python layer, and >> "arr.shape[0]" is often just fine, but currently one is in a situation >> where one must either write "(arr).shape[0])" or >> "np.PyArray_DIMS(arr)[0]", or be faced with code that isn't >> forward-compatible with NumPy. > > > > Can Cython emulate this at the C layer? And even your work-around for > the > Python object access looks more like a Cython bug to me. I wouldn't know > why that can't "just work". It usually works for other undeclared Python > attributes of "anything", so it might just as well be made to work here. Well, the problem is that shape is currently declared as a C field. It is also available as a Python attribute. Usually the user doesn't care which one is used, but the C field is declared for the few cases where access is speed-critical. Though even with current NumPy, I find myself doing "print (arr).shape" in order to get a tuple rather than a Py_ssize_t*... >> It would really be good to do the transition as fast as possible, so >> that >> all Cython code eventually becomes ready for upcoming NumPy releases. > > > > But it previously worked, right? It's just no longer supported in newer > NumPy versions IIUC? If that's the case, deleting it would break > otherwise > working code. No-one forces you to switch to the latest NumPy version, > after all, and certainly not right now. A warning is much better. It previously worked, but it turns out that it was always frowned-upon. I didn't know that when I added the fields, and it was a convenient way of speeding things up... >>> >>> >>> I would personally prefer either cdef nogil extension class properties >>> (needs compiler support) or just special-casing in the compiler, which >>> shouldn't be too hard I think. Warnings would be a first step, but the >>> linkage of ndarray attributes to C attributes is really an >>> implementation detail, so it's better to keep supporting the >>> attributes correctly. >> >> >> So you are saying we (somehow) stick with supporting "arr.shape[0]" in the >> future, and perhaps even support "print arr.shape"? (+ arr.dim, >> arr.strides). Exactly how we could figure out at PyCon. > > To remain consistent with previous versions the former should be > supported and the latter would be a bonus (and it wouldn't be too hard > anyway). > >> I'm anyway leaning towards deprecating arr.data, as it's too different from >> what the Python attribute does. > > +1 for that, just write &arr[0] as Sturla mentioned. The transition > should be trivial. If there's a confusion due to .data already having a certain meaning with the python attribute, perhaps it would make sense to have an attribute with a different name, eg. .ptr or .voidptr ? IMHO writing &arr[0] looks like a workaround of some kind. Like, when in C you had something like a 2d array and you'd need to interpret it as a 1d array you'd write &arr[0][0], but C array syntax doesn't support attributes which you can add here. Unless of course the idea is to make arrays to behave and look exactly like C counterparts. Dimitri. > >> Reason I'm asking is that I'm giving a talk on Saturday, and I don't want to >> teach people bad habits -- so we must figure out what the bad habits are :-) >> (I think this applies for the PyCon poster as well...) >> >> [1] PyData workshop at Google's offices in Mountain View; the event was open >> for all but now it is full with a long waiting list, which is why I didn't >> announce it. http://pydataworkshop.eventbrite.com/ >> >> >> Dag >> ___ >> cython-devel mailing list >> cython-devel@python.org >> http://mail.python.org/mailman/listinfo/cython-devel > ___ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel