Stefan Behnel, 30.05.2014 15:01:
> Emmanuel Gil Peyrot, 28.05.2014 13:25:
>> I was testing my cython codebase on top of pypy/cpyext, and I found a
>> memory corruption.  After investigating it with the pypy guys (thanks
>> arigato!), we identified it as a cython bug:
>>
>>  cdef void test(char *string):
>>      print(string)
>>
>>  def run(array):
>>      test(array[0])
>>
>>
>> This code works fine under cpython, but looking at the generated code
>> for the last line, the call to test is done with a pointer (__pyx_t_2)
>> to some potentially deallocated string (__pyx_t_1), which isn’t freed
>> just yet on cpython but is on pypy:
>>
>>  __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_array, 0, …);
>>  __Pyx_GOTREF(__pyx_t_1);
>>  __pyx_t_2 = __Pyx_PyObject_AsString(__pyx_t_1);
>>  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
>>  __pyx_f_1a_test(__pyx_t_2);
>>
>>
>> The obvious solution is to swap the last two lines, it then works fine
>> on pypy (although not necessarily if the test function stores the
>> pointer somewhere, but it’s not cython’s fault then).
>>
>> This issue can also happen with an explicit cast:
>>
>>  pointer = <char *>array[0]
>>  test(pointer)
>>
>>
>> I’m not sure if it should be addressed the same way, because that would
>> mean keeping a reference to array[0] for all the lifetime of the
>> current scope, but it could still prevent obscure bugs like the memory
>> corruption I had.
> 
> Neither of these two examples should compile. Even in CPython, indexing can
> not only return a safe reference but a new object, e.g.
> 
>     class Indy(object):
>         def __getitem__(self, i): return "abc%s" % i
> 
>     run(Indy())
> 
> This should crash also in CPython, or at least show unpredictable results.
> 
> Cython has a mechanism to reject this kind of code, not sure why it
> wouldn't strike here.

Here's a patch. However, while writing it up, I noticed that there are many
cases in CPython where these things are actually safe and nice, e.g.

    cdef char* charptr
    for charptr in list_of_byte_strings:
        ...

CPython would allow us to avoid some reference counting here, for example
(although I don't think we currently make use of it). And as long as the
list doesn't change (which is the user's concern, not Cython's), this is
also perfectly safe. Thus, we'd loose a nice feature, which speaks against
changing the current behaviour.

It's clear that at least emitting a warning in these cases would be
helpful, though. An option like -Wcpython could enable special warnings on
code that only works on CPython and not on other C-API implementations,
such as PyPy.

Stefan

# HG changeset patch
# Parent 6002d2e87f5c036778d2814595e974d269ba3845
prevent char* coercion of strings coming from indexed containers

diff -r 6002d2e87f5c Cython/Compiler/ExprNodes.py
--- a/Cython/Compiler/ExprNodes.py	Sat May 31 13:15:23 2014 +0200
+++ b/Cython/Compiler/ExprNodes.py	Sat May 31 14:26:10 2014 +0200
@@ -10862,7 +10862,8 @@
             error(arg.pos,
                   "Cannot convert Python object to '%s'" % result_type)
         if self.type.is_string or self.type.is_pyunicode_ptr:
-            if self.arg.is_ephemeral():
+            # FIXME: improve IndexNode.is_ephemeral() to avoid this special casing
+            if self.arg.is_ephemeral() or (self.arg.type.is_pyobject and self.arg.is_subscript):
                 error(arg.pos,
                       "Obtaining '%s' from temporary Python value" % result_type)
             elif self.arg.is_name and self.arg.entry and self.arg.entry.is_pyglobal:
diff -r 6002d2e87f5c tests/errors/charptr_from_temp.pyx
--- a/tests/errors/charptr_from_temp.pyx	Sat May 31 13:15:23 2014 +0200
+++ b/tests/errors/charptr_from_temp.pyx	Sat May 31 14:26:10 2014 +0200
@@ -18,6 +18,13 @@
 # temp => error
 cptr = s + b"cba"
 
+# indexing => error
+cptr = s[0]
+cdef char* x = <char*>s[0]
+
+# slicing => error
+cptr = s[:2]
+
 
 cdef unicode  c_u = u"abc"
 u = u"abc"
@@ -40,6 +47,9 @@
 _ERRORS = """
 16:8: Obtaining 'char *' from externally modifiable global Python value
 19:9: Obtaining 'char *' from temporary Python value
-34:9: Obtaining 'Py_UNICODE *' from externally modifiable global Python value
-37:10: Obtaining 'Py_UNICODE *' from temporary Python value
+22:8: Obtaining 'char *' from temporary Python value
+23:23: Obtaining 'char *' from temporary Python value
+26:8: Obtaining 'char *' from temporary Python value
+41:9: Obtaining 'Py_UNICODE *' from externally modifiable global Python value
+44:10: Obtaining 'Py_UNICODE *' from temporary Python value
 """
_______________________________________________
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel

Reply via email to