Joshua Adelman, 06.02.2014 16:21:
> This discussion was initially started on the cython user google group, but I 
> wanted to move the issue over to the dev list, as per the suggestion on 
> cython_trac. 

Original discussion is here:

http://thread.gmane.org/gmane.comp.python.cython.user/10646


> Given a numpy recarray containing two or more fixed-length string fields, if 
> those string fields are adjacent to one another cython does not properly 
> detect the boundary between the string fields. A concise test case 
> demonstrating the problem is:
> 
> ```cython
> cimport numpy as np
> 
> cdef packed struct tstruct:
>     np.float32_t a
>     np.int16_t b
>     char[6] c
>     char[4] d
> 
> def test_struct(tstruct[:] x):
>     pass
> ```
> 
> We then define some data on the python side:
> 
> ```python
> import numpy as np
> 
> a = np.recarray(3, dtype=[('a', np.float32),  ('b', np.int16), ('c', '|S6'), 
> ('d', '|S4')])
> a[0] = (1.1, 1, 'abcde', 'fgh')
> a[1] = (2.1, 2, 'ijklm', 'nop')
> a[2] = (3.1, 3, 'qrstu', 'vwx')
> 
> test_struct(a)
> ```
> 
> This results in the error:
> 
> ---------------------------------------------------------------------------
> ValueError       Traceback (most recent call last)
> 
> <ipython-input-12-ac01118a36a7> in <module>()
> ----> 1 test_struct(a)
> 
> ValueError: Expected a dimension of size 6, got 10
> 
> 
> If we swap the order of the fields in the recarray and `tstruct` to (a,c,b,d) 
> so that there is a numerical field between the string fields, then the 
> function can parse the memory view correctly. 
> 
> The relevant line of code that catches the incorrect value of `enc_count` is: 
> https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L468 
> 
> ``` 
> if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { 
>             PyErr_Format(PyExc_ValueError, 
>                          "Expected a dimension of size %zu, got %zu", 
>                          ctx->head->field->type->arraysize[0], 
> ctx->enc_count); 
>             return -1; 
>         } 
> ``` 
> 
> My naive guess is that there is something going on in: 
> https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L738

i.e. this code:

"""
      case 'O': case 's': case 'p':
        if (ctx->enc_type == *ts && got_Z == ctx->is_complex &&
            ctx->enc_packmode == ctx->new_packmode) {
          /* Continue pooling same type */
          ctx->enc_count += ctx->new_count;
        } else {
          /* New type */
          if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
          ctx->enc_count = ctx->new_count;
          ctx->enc_packmode = ctx->new_packmode;
          ctx->enc_type = *ts;
          ctx->is_complex = got_Z;
        }
        ++ts;
        ctx->new_count = 1;
        got_Z = 0;
        break;
"""

> since that appears to be the only place where `enc_count` is being 
> incremented. That would seem like the place where a boundary between two 
> string fields might not be properly handled (the comment in the line above 
> "Continue pooling same type" is suggestive.

Yes. That code is basically unchanged since Dag committed it back in 2009
(CC-ing him). However, this parser is fairly complex (not surprisingly, the
buffer type format *is* complicated), so I'm not sure about the right fix.
My guess is that the first part of the conditional belongs more in the
"case 'Z'" fall-through section that precedes the code above. Hard to say.
Even when I disable it ("if (0 &&...)"), I don't get any test failures in
the format tests.

In any case, I agree with you that the "pooling same size" is the bit that
goes wrong here.

I've attached a test case for now.

Stefan

diff -r 5e269313b1a1 tests/buffers/buffmt.pyx
--- a/tests/buffers/buffmt.pyx	Sat Feb 08 12:11:59 2014 +0100
+++ b/tests/buffers/buffmt.pyx	Sat Feb 08 19:45:36 2014 +0100
@@ -377,6 +377,22 @@
         fmt, sizeof(PartiallyPackedStruct2))
 
 
+cdef packed struct PackedStructWithCharArrays:
+    float a
+    int b
+    char[5] c
+    char[3] d
+
+
+@testcase
+def packed_struct_with_strings(fmt):
+    """
+    >>> packed_struct_with_strings("T{f:a:i:b:5s:c:3s:d:}")
+    """
+    cdef object[PackedStructWithCharArrays] buf = MockBuffer(
+        fmt, sizeof(PackedStructWithCharArrays))
+
+
 # TODO: empty struct
 # TODO: Incomplete structs
 # TODO: mixed structs
_______________________________________________
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel

Reply via email to