Joshua Adelman, 06.02.2014 16:21: > This discussion was initially started on the cython user google group, but I > wanted to move the issue over to the dev list, as per the suggestion on > cython_trac.
Original discussion is here: http://thread.gmane.org/gmane.comp.python.cython.user/10646 > Given a numpy recarray containing two or more fixed-length string fields, if > those string fields are adjacent to one another cython does not properly > detect the boundary between the string fields. A concise test case > demonstrating the problem is: > > ```cython > cimport numpy as np > > cdef packed struct tstruct: > np.float32_t a > np.int16_t b > char[6] c > char[4] d > > def test_struct(tstruct[:] x): > pass > ``` > > We then define some data on the python side: > > ```python > import numpy as np > > a = np.recarray(3, dtype=[('a', np.float32), ('b', np.int16), ('c', '|S6'), > ('d', '|S4')]) > a[0] = (1.1, 1, 'abcde', 'fgh') > a[1] = (2.1, 2, 'ijklm', 'nop') > a[2] = (3.1, 3, 'qrstu', 'vwx') > > test_struct(a) > ``` > > This results in the error: > > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > > <ipython-input-12-ac01118a36a7> in <module>() > ----> 1 test_struct(a) > > ValueError: Expected a dimension of size 6, got 10 > > > If we swap the order of the fields in the recarray and `tstruct` to (a,c,b,d) > so that there is a numerical field between the string fields, then the > function can parse the memory view correctly. > > The relevant line of code that catches the incorrect value of `enc_count` is: > https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L468 > > ``` > if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { > PyErr_Format(PyExc_ValueError, > "Expected a dimension of size %zu, got %zu", > ctx->head->field->type->arraysize[0], > ctx->enc_count); > return -1; > } > ``` > > My naive guess is that there is something going on in: > https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L738 i.e. this code: """ case 'O': case 's': case 'p': if (ctx->enc_type == *ts && got_Z == ctx->is_complex && ctx->enc_packmode == ctx->new_packmode) { /* Continue pooling same type */ ctx->enc_count += ctx->new_count; } else { /* New type */ if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL; ctx->enc_count = ctx->new_count; ctx->enc_packmode = ctx->new_packmode; ctx->enc_type = *ts; ctx->is_complex = got_Z; } ++ts; ctx->new_count = 1; got_Z = 0; break; """ > since that appears to be the only place where `enc_count` is being > incremented. That would seem like the place where a boundary between two > string fields might not be properly handled (the comment in the line above > "Continue pooling same type" is suggestive. Yes. That code is basically unchanged since Dag committed it back in 2009 (CC-ing him). However, this parser is fairly complex (not surprisingly, the buffer type format *is* complicated), so I'm not sure about the right fix. My guess is that the first part of the conditional belongs more in the "case 'Z'" fall-through section that precedes the code above. Hard to say. Even when I disable it ("if (0 &&...)"), I don't get any test failures in the format tests. In any case, I agree with you that the "pooling same size" is the bit that goes wrong here. I've attached a test case for now. Stefan
diff -r 5e269313b1a1 tests/buffers/buffmt.pyx --- a/tests/buffers/buffmt.pyx Sat Feb 08 12:11:59 2014 +0100 +++ b/tests/buffers/buffmt.pyx Sat Feb 08 19:45:36 2014 +0100 @@ -377,6 +377,22 @@ fmt, sizeof(PartiallyPackedStruct2)) +cdef packed struct PackedStructWithCharArrays: + float a + int b + char[5] c + char[3] d + + +@testcase +def packed_struct_with_strings(fmt): + """ + >>> packed_struct_with_strings("T{f:a:i:b:5s:c:3s:d:}") + """ + cdef object[PackedStructWithCharArrays] buf = MockBuffer( + fmt, sizeof(PackedStructWithCharArrays)) + + # TODO: empty struct # TODO: Incomplete structs # TODO: mixed structs
_______________________________________________ cython-devel mailing list cython-devel@python.org https://mail.python.org/mailman/listinfo/cython-devel