Re: [Numpy-discussion] Status of NumPy and Python 3.3

Christoph Gohlke Sat, 28 Jul 2012 23:25:34 -0700

On 7/28/2012 6:17 PM, Christoph Gohlke wrote:
> On 7/28/2012 6:09 PM, Ondřej Čertík wrote:
>> On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.cer...@gmail.com> 
>> wrote:
>>> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.cer...@gmail.com> 
>>> wrote:
>>>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.cer...@gmail.com> 
>>>> wrote:
>>>>> Many of the failures in
>>>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
>>>>> are of the type:
>>>>>
>>>>> ======================================================================
>>>>> FAIL: Check byteorder of single-dimensional objects
>>>>> ----------------------------------------------------------------------
>>>>> Traceback (most recent call last):
>>>>>     File 
>>>>> "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
>>>>> line 286, in test_valuesSD
>>>>>       self.assertTrue(ua[0] != ua2[0])
>>>>> AssertionError: False is not true
>>>>>
>>>>>
>>>>> and those are caused by the following minimal example:
>>>>>
>>>>> Python 3.2:
>>>>>
>>>>>>>> from numpy import array
>>>>>>>> a = array(["abc"])
>>>>>>>> b = a.newbyteorder()
>>>>>>>> a.dtype
>>>>> dtype('<U3')
>>>>>>>> b.dtype
>>>>> dtype('>U3')
>>>>>>>> a[0].dtype
>>>>> dtype('<U3')
>>>>>>>> b[0].dtype
>>>>> dtype('<U6')
>>>>>>>> a[0] == b[0]
>>>>> False
>>>>>>>> a[0]
>>>>> 'abc'
>>>>>>>> b[0]
>>>>> 'ៀ\udc00埀\udc00韀\udc00'
>>>>>
>>>>>
>>>>> Python 3.3:
>>>>>
>>>>>
>>>>>>>> from numpy import array
>>>>>>>> a = array(["abc"])
>>>>>>>> b = a.newbyteorder()
>>>>>>>> a.dtype
>>>>> dtype('<U3')
>>>>>>>> b.dtype
>>>>> dtype('>U3')
>>>>>>>> a[0].dtype
>>>>> dtype('<U3')
>>>>>>>> b[0].dtype
>>>>> dtype('<U3')
>>>>>>>> a[0] == b[0]
>>>>> True
>>>>>>>> a[0]
>>>>> 'abc'
>>>>>>>> b[0]
>>>>> 'abc'
>>>>>
>>>>>
>>>>> So somehow the newbyteorder() method doesn't change the dtype of the
>>>>> elements in our new code.
>>>>> This method is implemented in numpy/core/src/multiarray/descriptor.c
>>>>> (I think), but so far I don't see
>>>>> where the problem could be.
>>>>>
>>>>> Any ideas?
>>>>
>>>> Ok, after some investigating, I think we need to do something along these 
>>>> lines:
>>>>
>>>> diff --git a/numpy/core/src/multiarray/scalarapi.c 
>>>> b/numpy/core/src/multiarray/s
>>>> index c134aed..daf7fc4 100644
>>>> --- a/numpy/core/src/multiarray/scalarapi.c
>>>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>>> @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, 
>>>> PyObject *
>>>>    #if PY_VERSION_HEX >= 0x03030000
>>>>        if (type_num == NPY_UNICODE) {
>>>>            PyObject *b, *args;
>>>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>>>> +        if (swap) {
>>>> +            char *buffer;
>>>> +            buffer = malloc(itemsize);
>>>> +            if (buffer == NULL) {
>>>> +                PyErr_NoMemory();
>>>> +            }
>>>> +            memcpy(buffer, data, itemsize);
>>>> +            byte_swap_vector(buffer, itemsize, 4);
>>>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>>>> +            // We have to deallocate this later, otherwise we get a 
>>>> segfault...
>>>> +            //free(buffer);
>>>> +        } else {
>>>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>>>> +        }
>>>>            if (b == NULL) {
>>>>                return NULL;
>>>>            }
>>>>
>>>> This particular implementation still fails though:
>>>>
>>>>
>>>>>>> from numpy import array
>>>>>>> a = array(["abc"])
>>>>>>> b = a.newbyteorder()
>>>>>>> a.dtype
>>>> dtype('<U3')
>>>>>>> b.dtype
>>>> dtype('>U3')
>>>>>>> a[0].dtype
>>>> dtype('<U3')
>>>>>>> b[0].dtype
>>>> Traceback (most recent call last):
>>>>     File "<stdin>", line 1, in <module>
>>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>>> codepoint not in range(0x110000)
>>>>>>> a[0] == b[0]
>>>> Traceback (most recent call last):
>>>>     File "<stdin>", line 1, in <module>
>>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>>> codepoint not in range(0x110000)
>>>>>>> a[0]
>>>> 'abc'
>>>>>>> b[0]
>>>> Traceback (most recent call last):
>>>>     File "<stdin>", line 1, in <module>
>>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>>> codepoint not in range(0x110000)
>>>>
>>>>
>>>>
>>>> But I think that we simply need to take into account the "swap" flag.
>>>
>>> Ok, so first of all, I tried to disable the swapping in Python 3.2:
>>>
>>>                   if (swap) {
>>>                       byte_swap_vector(buffer, itemsize >> 2, 4);
>>>                   }
>>>
>>> And then it behaves *exactly* as in Python 3.3. So I am pretty sure
>>> that the problem is right there and something
>>> along the lines of my patch above should fix it. I had a few bugs
>>> there, here is the correct version:
>>>
>>> diff --git a/numpy/core/src/multiarray/scalarapi.c 
>>> b/numpy/core/src/multiarray/s
>>> index c134aed..bed73f7 100644
>>> --- a/numpy/core/src/multiarray/scalarapi.c
>>> +++ b/numpy/core/src/multiarray/scalarapi.c
>>> @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, 
>>> PyObject *
>>>    #if PY_VERSION_HEX >= 0x03030000
>>>        if (type_num == NPY_UNICODE) {
>>>            PyObject *b, *args;
>>> -        b = PyBytes_FromStringAndSize(data, itemsize);
>>> +        if (swap) {
>>> +            char *buffer;
>>> +            buffer = malloc(itemsize);
>>> +            if (buffer == NULL) {
>>> +                PyErr_NoMemory();
>>> +            }
>>> +            memcpy(buffer, data, itemsize);
>>> +            byte_swap_vector(buffer, itemsize >> 2, 4);
>>> +            b = PyBytes_FromStringAndSize(buffer, itemsize);
>>> +            free(buffer);
>>> +        } else {
>>> +            b = PyBytes_FromStringAndSize(data, itemsize);
>>> +        }
>>>            if (b == NULL) {
>>>                return NULL;
>>>            }
>>>
>>>
>>> That works well, except that it gives the UnicodeDecodeError:
>>>
>>>>>> b[0].dtype
>>> NULL
>>> Traceback (most recent call last):
>>>     File "<stdin>", line 1, in <module>
>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
>>> codepoint not in range(0x110000)
>>>
>>> This error is actually triggered by this line:
>>>
>>>
>>>           obj = type->tp_new(type, args, NULL);
>>>
>>> in the patch by Stefan above. So I think what is happening is that it
>>> simply tries to convert it from bytes
>>> to a string and fails. That makes great sense. The question is why
>>> doesn't it fail in exactly the same way
>>> in Python 3.2? I think it's because the conversion check is bypassed
>>> somehow. Stefan, I think
>>> we need to swap it after the object is created. I am still
>>> experimenting with this.
>>
>> Well, I simply went to the Python sources and then implemented a
>> solution that works with this patch:
>>
>> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654
>>
>> So now the PR actually seems to work. The rest of the failures are here:
>>
>> https://gist.github.com/3195520
>>
>> and they seem to be unrelated. Can somebody please review this PR?
>>
>> https://github.com/numpy/numpy/pull/366
>>
>>
>> I will squash the commits after it's reviewed (I want to keep the
>> history there for now).
>>
>>
>> Ondrej
>
>
> Thank you. I backported the PR to numpy 1.6.2 and it works for me on
> win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures
> of the kind:
>
> AssertionError:
> Items are not equal:
>    ACTUAL: ()
>    DESIRED: None
>
>
> Christoph


Pull request #367 should fix the NewBufferProtocol test failures.

https://github.com/numpy/numpy/pull/367

Christoph
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Status of NumPy and Python 3.3

Reply via email to