[Cython] Problem with Py_buffer struct definition in Builtin.py

2012-03-02 Thread Stefan Behnel
Hi,

the builtin Py_buffer struct type is currently defined as follows:

"""
builtin_structs_table = [
('Py_buffer', 'Py_buffer',
 [("buf",PyrexTypes.c_void_ptr_type),
  ("obj",PyrexTypes.py_object_type),
  ("len",PyrexTypes.c_py_ssize_t_type),
  ...
"""

I hadn't noticed this before, but when you actually use it in a
"__getbuffer__()" special method, you have to assign the buffer owner (i.e.
self) to the .obj field, which is currently defined as "object". Meaning,
Cython will do reference counting for it. That's fine for the self
reference being assigned to it. However, when "__getbuffer__()" gets
called, .obj is not assigned, i.e. it may be NULL, or it may contain
garbage, depending on the caller. So, Cython will generate broken code for
it when trying to DECREF the original value as part of the assignment.

"""
 * buffer.obj = self # <<
 */
  __Pyx_INCREF(((PyObject *)__pyx_v_self));
  __Pyx_GIVEREF(((PyObject *)__pyx_v_self));
  __Pyx_GOTREF(__pyx_v_buffer->obj);
  __Pyx_DECREF(__pyx_v_buffer->obj);   // please crash for me
  __pyx_v_buffer->obj = ((PyObject *)__pyx_v_self);
"""

I see a couple of ways to fix this:

1) Change the field to PyObject* instead of object (and finally make
PyObject a known type in Cython). Problem: may break existing code,
although that's currently broken anyway. Also makes it more annoying for
users who then have to do the manual cast+INCREF dance.

2) Special case this field and do not do any reference counting for its
current value. PyBuffer_Release() will properly do the XDECREF cleanup
anyway, and it's unlikely that user code will assign to it more than once.
Problem: very unexpected behaviour for users in case they ever *do*
multiple assignments.

3) Wrap the user provided "__getbuffer__()" special method in code that
always assigns "self" to the .obj field before entering the user code and
also clears it on error return. That would mean that it will always contain
a valid value, so that ref-counting will work as expected. Problem: this
complicates the implementation and may not be obvious for users (who don't
read the documentation). It also won't fix any code that uses Py_buffer
outside of "__getbuffer__()" or that calls "__releasebuffer__()" directly
without going through PyBuffer_Release(). However, this actually makes it
easier for users, who can then even skip setting the value in all but some
very rare cases and just care about the actual buffer setup in their
"__getbuffer__()" code.

I'm leaning towards 3). What do you think?

I also think that this is worth fixing for 0.16 if possible, because it
currently generates crash code from innocent and correct looking user code.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Problem with Py_buffer struct definition in Builtin.py

2012-03-02 Thread Stefan Behnel
Stefan Behnel, 02.03.2012 10:45:
> the builtin Py_buffer struct type is currently defined as follows:
> 
> """
> builtin_structs_table = [
> ('Py_buffer', 'Py_buffer',
>  [("buf",PyrexTypes.c_void_ptr_type),
>   ("obj",PyrexTypes.py_object_type),
>   ("len",PyrexTypes.c_py_ssize_t_type),
>   ...
> """
> 
> I hadn't noticed this before, but when you actually use it in a
> "__getbuffer__()" special method, you have to assign the buffer owner (i.e.
> self) to the .obj field, which is currently defined as "object".

Oh, well, I should really learn to read code before composing a lengthy
e-mail...

"__getbuffer__()" is already special-cased and sets the value to None. I
think I even recall that we discussed this back when Dag implemented
support for buffers. The reason I had originally noticed this was this
recent change in Py3.3:

"""
static PyObject *
_PyManagedBuffer_FromObject(PyObject *base)
{
_PyManagedBufferObject *mbuf;

mbuf = mbuf_alloc();
if (mbuf == NULL)
return NULL;

if (PyObject_GetBuffer(base, &mbuf->master, PyBUF_FULL_RO) < 0) {
/* mbuf->master.obj must be NULL. */
Py_DECREF(mbuf);
return NULL;
}

/* Assume that master.obj is a new reference to base. */
assert(mbuf->master.obj == base);

return (PyObject *)mbuf;
}
"""

Note the assertion (which is unreleased as of now, so it may still be
subject to changes). Is there any reason the value should be set to None by
Cython's special casing code instead of self?

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython 0.16 and ndarray fields deprecation

2012-03-02 Thread mark florisson
On 1 March 2012 16:18, Dag Sverre Seljebotn  wrote:
> On 03/01/2012 04:03 AM, mark florisson wrote:
>>
>> On 29 February 2012 17:57, Dag Sverre Seljebotn
>>   wrote:
>>>
>>> On 02/29/2012 09:42 AM, Stefan Behnel wrote:


 Dag Sverre Seljebotn, 29.02.2012 18:06:
>
>
> I'm wondering what the best course of action for deprecating the shape
> field in numpy.pxd is.
>
> The thing is, currently "shape" really gets in the way. In most
> situations
> it is OK with slow access to shape through the Python layer, and
> "arr.shape[0]" is often just fine, but currently one is in a situation
> where one must either write "(arr).shape[0])" or
> "np.PyArray_DIMS(arr)[0]", or be faced with code that isn't
> forward-compatible with NumPy.



 Can Cython emulate this at the C layer? And even your work-around for
 the
 Python object access looks more like a Cython bug to me. I wouldn't know
 why that can't "just work". It usually works for other undeclared Python
 attributes of "anything", so it might just as well be made to work here.
>>>
>>>
>>>
>>> Well, the problem is that shape is currently declared as a C field. It is
>>> also available as a Python attribute. Usually the user doesn't care which
>>> one is used, but the C field is declared for the few cases where access
>>> is
>>> speed-critical.
>>>
>>> Though even with current NumPy, I find myself doing "print
>>> (arr).shape" in order to get a tuple rather than a Py_ssize_t*...
>>>
>>>
> It would really be good to do the transition as fast as possible, so
> that
> all Cython code eventually becomes ready for upcoming NumPy releases.



 But it previously worked, right? It's just no longer supported in newer
 NumPy versions IIUC? If that's the case, deleting it would break
 otherwise
 working code. No-one forces you to switch to the latest NumPy version,
 after all, and certainly not right now. A warning is much better.
>>>
>>>
>>>
>>> It previously worked, but it turns out that it was always frowned-upon. I
>>> didn't know that when I added the fields, and it was a convenient way of
>>> speeding things up...
>>
>>
>> I would personally prefer either cdef nogil extension class properties
>> (needs compiler support) or just special-casing in the compiler, which
>> shouldn't be too hard I think. Warnings would be a first step, but the
>> linkage of ndarray attributes to C attributes is really an
>> implementation detail, so it's better to keep supporting the
>> attributes correctly.
>
>
> So you are saying we (somehow) stick with supporting "arr.shape[0]" in the
> future, and perhaps even support "print arr.shape"? (+ arr.dim,
> arr.strides). Exactly how we could figure out at PyCon.

To remain consistent with previous versions the former should be
supported and the latter would be a bonus (and it wouldn't be too hard
anyway).

> I'm anyway leaning towards deprecating arr.data, as it's too different from
> what the Python attribute does.

+1 for that, just write &arr[0] as Sturla mentioned. The transition
should be trivial.

> Reason I'm asking is that I'm giving a talk on Saturday, and I don't want to
> teach people bad habits -- so we must figure out what the bad habits are :-)
> (I think this applies for the PyCon poster as well...)
>
> [1] PyData workshop at Google's offices in Mountain View; the event was open
> for all but now it is full with a long waiting list, which is why I didn't
> announce it. http://pydataworkshop.eventbrite.com/
>
>
> Dag
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Problem with Py_buffer struct definition in Builtin.py

2012-03-02 Thread mark florisson
On 2 March 2012 10:09, Stefan Behnel  wrote:
> Stefan Behnel, 02.03.2012 10:45:
>> the builtin Py_buffer struct type is currently defined as follows:
>>
>> """
>> builtin_structs_table = [
>>     ('Py_buffer', 'Py_buffer',
>>      [("buf",        PyrexTypes.c_void_ptr_type),
>>       ("obj",        PyrexTypes.py_object_type),
>>       ("len",        PyrexTypes.c_py_ssize_t_type),
>>       ...
>> """
>>
>> I hadn't noticed this before, but when you actually use it in a
>> "__getbuffer__()" special method, you have to assign the buffer owner (i.e.
>> self) to the .obj field, which is currently defined as "object".
>
> Oh, well, I should really learn to read code before composing a lengthy
> e-mail...
>
> "__getbuffer__()" is already special-cased and sets the value to None. I
> think I even recall that we discussed this back when Dag implemented
> support for buffers. The reason I had originally noticed this was this
> recent change in Py3.3:
>
> """
> static PyObject *
> _PyManagedBuffer_FromObject(PyObject *base)
> {
>    _PyManagedBufferObject *mbuf;
>
>    mbuf = mbuf_alloc();
>    if (mbuf == NULL)
>        return NULL;
>
>    if (PyObject_GetBuffer(base, &mbuf->master, PyBUF_FULL_RO) < 0) {
>        /* mbuf->master.obj must be NULL. */
>        Py_DECREF(mbuf);
>        return NULL;
>    }
>
>    /* Assume that master.obj is a new reference to base. */
>    assert(mbuf->master.obj == base);
>
>    return (PyObject *)mbuf;
> }
> """
>
> Note the assertion (which is unreleased as of now, so it may still be
> subject to changes). Is there any reason the value should be set to None by
> Cython's special casing code instead of self?

It sets it to None to it can later reset it to NULL. Python 3.3 is the
first version to document the 'obj' Py_buffer attribute, and it
mentions that the exporter may set it to NULL. The code above is not
the PyObject_GetBuffer function, so setting it to NULL should still
work. Defaulting it to 'self' instead would work equally well I'd
think.

> Stefan
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython 0.16 and ndarray fields deprecation

2012-03-02 Thread mark florisson
On 1 March 2012 19:16, Sturla Molden  wrote:
> On 01.03.2012 19:33, Dag Sverre Seljebotn wrote:
>>
>>
>> Yeah, I proposed this on another thread as one of the options, but the
>> support wasn't overwhelming at the time...
>
>
> I think it is worse to break parts of it, thus introducing bugs that might
> go silent for a long time.

The point is that we would remain fully backwards compatible (except
for the data attribute perhaps) and support the attributes portably
across numpy versions, nothing would be broken or silent.

> Rather deprecate the whole ndarray interface.
>

As much as I would like that, you can't just break everyone's code in
a new release. I think the syntax should be removed several versions
later, or maybe even wait until Cython 1.0.

> Sturla
>
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython 0.16 and ndarray fields deprecation

2012-03-02 Thread Dimitri Tcaciuc
On Fri, Mar 2, 2012 at 8:29 AM, mark florisson
 wrote:
> On 1 March 2012 16:18, Dag Sverre Seljebotn  
> wrote:
>> On 03/01/2012 04:03 AM, mark florisson wrote:
>>>
>>> On 29 February 2012 17:57, Dag Sverre Seljebotn
>>>   wrote:

 On 02/29/2012 09:42 AM, Stefan Behnel wrote:
>
>
> Dag Sverre Seljebotn, 29.02.2012 18:06:
>>
>>
>> I'm wondering what the best course of action for deprecating the shape
>> field in numpy.pxd is.
>>
>> The thing is, currently "shape" really gets in the way. In most
>> situations
>> it is OK with slow access to shape through the Python layer, and
>> "arr.shape[0]" is often just fine, but currently one is in a situation
>> where one must either write "(arr).shape[0])" or
>> "np.PyArray_DIMS(arr)[0]", or be faced with code that isn't
>> forward-compatible with NumPy.
>
>
>
> Can Cython emulate this at the C layer? And even your work-around for
> the
> Python object access looks more like a Cython bug to me. I wouldn't know
> why that can't "just work". It usually works for other undeclared Python
> attributes of "anything", so it might just as well be made to work here.



 Well, the problem is that shape is currently declared as a C field. It is
 also available as a Python attribute. Usually the user doesn't care which
 one is used, but the C field is declared for the few cases where access
 is
 speed-critical.

 Though even with current NumPy, I find myself doing "print
 (arr).shape" in order to get a tuple rather than a Py_ssize_t*...


>> It would really be good to do the transition as fast as possible, so
>> that
>> all Cython code eventually becomes ready for upcoming NumPy releases.
>
>
>
> But it previously worked, right? It's just no longer supported in newer
> NumPy versions IIUC? If that's the case, deleting it would break
> otherwise
> working code. No-one forces you to switch to the latest NumPy version,
> after all, and certainly not right now. A warning is much better.



 It previously worked, but it turns out that it was always frowned-upon. I
 didn't know that when I added the fields, and it was a convenient way of
 speeding things up...
>>>
>>>
>>> I would personally prefer either cdef nogil extension class properties
>>> (needs compiler support) or just special-casing in the compiler, which
>>> shouldn't be too hard I think. Warnings would be a first step, but the
>>> linkage of ndarray attributes to C attributes is really an
>>> implementation detail, so it's better to keep supporting the
>>> attributes correctly.
>>
>>
>> So you are saying we (somehow) stick with supporting "arr.shape[0]" in the
>> future, and perhaps even support "print arr.shape"? (+ arr.dim,
>> arr.strides). Exactly how we could figure out at PyCon.
>
> To remain consistent with previous versions the former should be
> supported and the latter would be a bonus (and it wouldn't be too hard
> anyway).
>
>> I'm anyway leaning towards deprecating arr.data, as it's too different from
>> what the Python attribute does.
>
> +1 for that, just write &arr[0] as Sturla mentioned. The transition
> should be trivial.

If there's a confusion due to .data already having a certain meaning
with the python attribute, perhaps it would make sense to have an
attribute with a different name, eg. .ptr or .voidptr ? IMHO writing
&arr[0] looks like a workaround of some kind. Like, when in C you had
something like a 2d array and you'd need to interpret it as a 1d array
you'd write &arr[0][0], but C array syntax doesn't support attributes
which you can add here. Unless of course the idea is to make arrays to
behave and look exactly like C counterparts.

Dimitri.

>
>> Reason I'm asking is that I'm giving a talk on Saturday, and I don't want to
>> teach people bad habits -- so we must figure out what the bad habits are :-)
>> (I think this applies for the PyCon poster as well...)
>>
>> [1] PyData workshop at Google's offices in Mountain View; the event was open
>> for all but now it is full with a long waiting list, which is why I didn't
>> announce it. http://pydataworkshop.eventbrite.com/
>>
>>
>> Dag
>> ___
>> cython-devel mailing list
>> cython-devel@python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel