On 7/28/2012 6:17 PM, Christoph Gohlke wrote: > On 7/28/2012 6:09 PM, Ondřej Čertík wrote: >> On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík <ondrej.cer...@gmail.com> >> wrote: >>> On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík <ondrej.cer...@gmail.com> >>> wrote: >>>> On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.cer...@gmail.com> >>>> wrote: >>>>> Many of the failures in >>>>> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 >>>>> are of the type: >>>>> >>>>> ====================================================================== >>>>> FAIL: Check byteorder of single-dimensional objects >>>>> ---------------------------------------------------------------------- >>>>> Traceback (most recent call last): >>>>> File >>>>> "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", >>>>> line 286, in test_valuesSD >>>>> self.assertTrue(ua[0] != ua2[0]) >>>>> AssertionError: False is not true >>>>> >>>>> >>>>> and those are caused by the following minimal example: >>>>> >>>>> Python 3.2: >>>>> >>>>>>>> from numpy import array >>>>>>>> a = array(["abc"]) >>>>>>>> b = a.newbyteorder() >>>>>>>> a.dtype >>>>> dtype('<U3') >>>>>>>> b.dtype >>>>> dtype('>U3') >>>>>>>> a[0].dtype >>>>> dtype('<U3') >>>>>>>> b[0].dtype >>>>> dtype('<U6') >>>>>>>> a[0] == b[0] >>>>> False >>>>>>>> a[0] >>>>> 'abc' >>>>>>>> b[0] >>>>> 'ៀ\udc00埀\udc00韀\udc00' >>>>> >>>>> >>>>> Python 3.3: >>>>> >>>>> >>>>>>>> from numpy import array >>>>>>>> a = array(["abc"]) >>>>>>>> b = a.newbyteorder() >>>>>>>> a.dtype >>>>> dtype('<U3') >>>>>>>> b.dtype >>>>> dtype('>U3') >>>>>>>> a[0].dtype >>>>> dtype('<U3') >>>>>>>> b[0].dtype >>>>> dtype('<U3') >>>>>>>> a[0] == b[0] >>>>> True >>>>>>>> a[0] >>>>> 'abc' >>>>>>>> b[0] >>>>> 'abc' >>>>> >>>>> >>>>> So somehow the newbyteorder() method doesn't change the dtype of the >>>>> elements in our new code. >>>>> This method is implemented in numpy/core/src/multiarray/descriptor.c >>>>> (I think), but so far I don't see >>>>> where the problem could be. >>>>> >>>>> Any ideas? >>>> >>>> Ok, after some investigating, I think we need to do something along these >>>> lines: >>>> >>>> diff --git a/numpy/core/src/multiarray/scalarapi.c >>>> b/numpy/core/src/multiarray/s >>>> index c134aed..daf7fc4 100644 >>>> --- a/numpy/core/src/multiarray/scalarapi.c >>>> +++ b/numpy/core/src/multiarray/scalarapi.c >>>> @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, >>>> PyObject * >>>> #if PY_VERSION_HEX >= 0x03030000 >>>> if (type_num == NPY_UNICODE) { >>>> PyObject *b, *args; >>>> - b = PyBytes_FromStringAndSize(data, itemsize); >>>> + if (swap) { >>>> + char *buffer; >>>> + buffer = malloc(itemsize); >>>> + if (buffer == NULL) { >>>> + PyErr_NoMemory(); >>>> + } >>>> + memcpy(buffer, data, itemsize); >>>> + byte_swap_vector(buffer, itemsize, 4); >>>> + b = PyBytes_FromStringAndSize(buffer, itemsize); >>>> + // We have to deallocate this later, otherwise we get a >>>> segfault... >>>> + //free(buffer); >>>> + } else { >>>> + b = PyBytes_FromStringAndSize(data, itemsize); >>>> + } >>>> if (b == NULL) { >>>> return NULL; >>>> } >>>> >>>> This particular implementation still fails though: >>>> >>>> >>>>>>> from numpy import array >>>>>>> a = array(["abc"]) >>>>>>> b = a.newbyteorder() >>>>>>> a.dtype >>>> dtype('<U3') >>>>>>> b.dtype >>>> dtype('>U3') >>>>>>> a[0].dtype >>>> dtype('<U3') >>>>>>> b[0].dtype >>>> Traceback (most recent call last): >>>> File "<stdin>", line 1, in <module> >>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>>> codepoint not in range(0x110000) >>>>>>> a[0] == b[0] >>>> Traceback (most recent call last): >>>> File "<stdin>", line 1, in <module> >>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>>> codepoint not in range(0x110000) >>>>>>> a[0] >>>> 'abc' >>>>>>> b[0] >>>> Traceback (most recent call last): >>>> File "<stdin>", line 1, in <module> >>>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>>> codepoint not in range(0x110000) >>>> >>>> >>>> >>>> But I think that we simply need to take into account the "swap" flag. >>> >>> Ok, so first of all, I tried to disable the swapping in Python 3.2: >>> >>> if (swap) { >>> byte_swap_vector(buffer, itemsize >> 2, 4); >>> } >>> >>> And then it behaves *exactly* as in Python 3.3. So I am pretty sure >>> that the problem is right there and something >>> along the lines of my patch above should fix it. I had a few bugs >>> there, here is the correct version: >>> >>> diff --git a/numpy/core/src/multiarray/scalarapi.c >>> b/numpy/core/src/multiarray/s >>> index c134aed..bed73f7 100644 >>> --- a/numpy/core/src/multiarray/scalarapi.c >>> +++ b/numpy/core/src/multiarray/scalarapi.c >>> @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, >>> PyObject * >>> #if PY_VERSION_HEX >= 0x03030000 >>> if (type_num == NPY_UNICODE) { >>> PyObject *b, *args; >>> - b = PyBytes_FromStringAndSize(data, itemsize); >>> + if (swap) { >>> + char *buffer; >>> + buffer = malloc(itemsize); >>> + if (buffer == NULL) { >>> + PyErr_NoMemory(); >>> + } >>> + memcpy(buffer, data, itemsize); >>> + byte_swap_vector(buffer, itemsize >> 2, 4); >>> + b = PyBytes_FromStringAndSize(buffer, itemsize); >>> + free(buffer); >>> + } else { >>> + b = PyBytes_FromStringAndSize(data, itemsize); >>> + } >>> if (b == NULL) { >>> return NULL; >>> } >>> >>> >>> That works well, except that it gives the UnicodeDecodeError: >>> >>>>>> b[0].dtype >>> NULL >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: >>> codepoint not in range(0x110000) >>> >>> This error is actually triggered by this line: >>> >>> >>> obj = type->tp_new(type, args, NULL); >>> >>> in the patch by Stefan above. So I think what is happening is that it >>> simply tries to convert it from bytes >>> to a string and fails. That makes great sense. The question is why >>> doesn't it fail in exactly the same way >>> in Python 3.2? I think it's because the conversion check is bypassed >>> somehow. Stefan, I think >>> we need to swap it after the object is created. I am still >>> experimenting with this. >> >> Well, I simply went to the Python sources and then implemented a >> solution that works with this patch: >> >> https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27654 >> >> So now the PR actually seems to work. The rest of the failures are here: >> >> https://gist.github.com/3195520 >> >> and they seem to be unrelated. Can somebody please review this PR? >> >> https://github.com/numpy/numpy/pull/366 >> >> >> I will squash the commits after it's reviewed (I want to keep the >> history there for now). >> >> >> Ondrej > > > Thank you. I backported the PR to numpy 1.6.2 and it works for me on > win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures > of the kind: > > AssertionError: > Items are not equal: > ACTUAL: () > DESIRED: None > > > Christoph
Pull request #367 should fix the NewBufferProtocol test failures. https://github.com/numpy/numpy/pull/367 Christoph _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion