Stefan Behnel, 30.05.2014 15:01: > Emmanuel Gil Peyrot, 28.05.2014 13:25: >> I was testing my cython codebase on top of pypy/cpyext, and I found a >> memory corruption. After investigating it with the pypy guys (thanks >> arigato!), we identified it as a cython bug: >> >> cdef void test(char *string): >> print(string) >> >> def run(array): >> test(array[0]) >> >> >> This code works fine under cpython, but looking at the generated code >> for the last line, the call to test is done with a pointer (__pyx_t_2) >> to some potentially deallocated string (__pyx_t_1), which isn’t freed >> just yet on cpython but is on pypy: >> >> __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_array, 0, …); >> __Pyx_GOTREF(__pyx_t_1); >> __pyx_t_2 = __Pyx_PyObject_AsString(__pyx_t_1); >> __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; >> __pyx_f_1a_test(__pyx_t_2); >> >> >> The obvious solution is to swap the last two lines, it then works fine >> on pypy (although not necessarily if the test function stores the >> pointer somewhere, but it’s not cython’s fault then). >> >> This issue can also happen with an explicit cast: >> >> pointer = <char *>array[0] >> test(pointer) >> >> >> I’m not sure if it should be addressed the same way, because that would >> mean keeping a reference to array[0] for all the lifetime of the >> current scope, but it could still prevent obscure bugs like the memory >> corruption I had. > > Neither of these two examples should compile. Even in CPython, indexing can > not only return a safe reference but a new object, e.g. > > class Indy(object): > def __getitem__(self, i): return "abc%s" % i > > run(Indy()) > > This should crash also in CPython, or at least show unpredictable results. > > Cython has a mechanism to reject this kind of code, not sure why it > wouldn't strike here.
Here's a patch. However, while writing it up, I noticed that there are many cases in CPython where these things are actually safe and nice, e.g. cdef char* charptr for charptr in list_of_byte_strings: ... CPython would allow us to avoid some reference counting here, for example (although I don't think we currently make use of it). And as long as the list doesn't change (which is the user's concern, not Cython's), this is also perfectly safe. Thus, we'd loose a nice feature, which speaks against changing the current behaviour. It's clear that at least emitting a warning in these cases would be helpful, though. An option like -Wcpython could enable special warnings on code that only works on CPython and not on other C-API implementations, such as PyPy. Stefan
# HG changeset patch # Parent 6002d2e87f5c036778d2814595e974d269ba3845 prevent char* coercion of strings coming from indexed containers diff -r 6002d2e87f5c Cython/Compiler/ExprNodes.py --- a/Cython/Compiler/ExprNodes.py Sat May 31 13:15:23 2014 +0200 +++ b/Cython/Compiler/ExprNodes.py Sat May 31 14:26:10 2014 +0200 @@ -10862,7 +10862,8 @@ error(arg.pos, "Cannot convert Python object to '%s'" % result_type) if self.type.is_string or self.type.is_pyunicode_ptr: - if self.arg.is_ephemeral(): + # FIXME: improve IndexNode.is_ephemeral() to avoid this special casing + if self.arg.is_ephemeral() or (self.arg.type.is_pyobject and self.arg.is_subscript): error(arg.pos, "Obtaining '%s' from temporary Python value" % result_type) elif self.arg.is_name and self.arg.entry and self.arg.entry.is_pyglobal: diff -r 6002d2e87f5c tests/errors/charptr_from_temp.pyx --- a/tests/errors/charptr_from_temp.pyx Sat May 31 13:15:23 2014 +0200 +++ b/tests/errors/charptr_from_temp.pyx Sat May 31 14:26:10 2014 +0200 @@ -18,6 +18,13 @@ # temp => error cptr = s + b"cba" +# indexing => error +cptr = s[0] +cdef char* x = <char*>s[0] + +# slicing => error +cptr = s[:2] + cdef unicode c_u = u"abc" u = u"abc" @@ -40,6 +47,9 @@ _ERRORS = """ 16:8: Obtaining 'char *' from externally modifiable global Python value 19:9: Obtaining 'char *' from temporary Python value -34:9: Obtaining 'Py_UNICODE *' from externally modifiable global Python value -37:10: Obtaining 'Py_UNICODE *' from temporary Python value +22:8: Obtaining 'char *' from temporary Python value +23:23: Obtaining 'char *' from temporary Python value +26:8: Obtaining 'char *' from temporary Python value +41:9: Obtaining 'Py_UNICODE *' from externally modifiable global Python value +44:10: Obtaining 'Py_UNICODE *' from temporary Python value """
_______________________________________________ cython-devel mailing list cython-devel@python.org https://mail.python.org/mailman/listinfo/cython-devel