Re: [Cython] Wrong order of __Pyx_DECREF when calling a function with an implicit str → char* conversion.

2014-05-31 Thread Stefan Behnel
Stefan Behnel, 30.05.2014 15:01:
> Emmanuel Gil Peyrot, 28.05.2014 13:25:
>> I was testing my cython codebase on top of pypy/cpyext, and I found a
>> memory corruption.  After investigating it with the pypy guys (thanks
>> arigato!), we identified it as a cython bug:
>>
>>  cdef void test(char *string):
>>  print(string)
>>
>>  def run(array):
>>  test(array[0])
>>
>>
>> This code works fine under cpython, but looking at the generated code
>> for the last line, the call to test is done with a pointer (__pyx_t_2)
>> to some potentially deallocated string (__pyx_t_1), which isn’t freed
>> just yet on cpython but is on pypy:
>>
>>  __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_array, 0, …);
>>  __Pyx_GOTREF(__pyx_t_1);
>>  __pyx_t_2 = __Pyx_PyObject_AsString(__pyx_t_1);
>>  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
>>  __pyx_f_1a_test(__pyx_t_2);
>>
>>
>> The obvious solution is to swap the last two lines, it then works fine
>> on pypy (although not necessarily if the test function stores the
>> pointer somewhere, but it’s not cython’s fault then).
>>
>> This issue can also happen with an explicit cast:
>>
>>  pointer = array[0]
>>  test(pointer)
>>
>>
>> I’m not sure if it should be addressed the same way, because that would
>> mean keeping a reference to array[0] for all the lifetime of the
>> current scope, but it could still prevent obscure bugs like the memory
>> corruption I had.
> 
> Neither of these two examples should compile. Even in CPython, indexing can
> not only return a safe reference but a new object, e.g.
> 
> class Indy(object):
> def __getitem__(self, i): return "abc%s" % i
> 
> run(Indy())
> 
> This should crash also in CPython, or at least show unpredictable results.
> 
> Cython has a mechanism to reject this kind of code, not sure why it
> wouldn't strike here.

Here's a patch. However, while writing it up, I noticed that there are many
cases in CPython where these things are actually safe and nice, e.g.

cdef char* charptr
for charptr in list_of_byte_strings:
...

CPython would allow us to avoid some reference counting here, for example
(although I don't think we currently make use of it). And as long as the
list doesn't change (which is the user's concern, not Cython's), this is
also perfectly safe. Thus, we'd loose a nice feature, which speaks against
changing the current behaviour.

It's clear that at least emitting a warning in these cases would be
helpful, though. An option like -Wcpython could enable special warnings on
code that only works on CPython and not on other C-API implementations,
such as PyPy.

Stefan

# HG changeset patch
# Parent 6002d2e87f5c036778d2814595e974d269ba3845
prevent char* coercion of strings coming from indexed containers

diff -r 6002d2e87f5c Cython/Compiler/ExprNodes.py
--- a/Cython/Compiler/ExprNodes.py	Sat May 31 13:15:23 2014 +0200
+++ b/Cython/Compiler/ExprNodes.py	Sat May 31 14:26:10 2014 +0200
@@ -10862,7 +10862,8 @@
 error(arg.pos,
   "Cannot convert Python object to '%s'" % result_type)
 if self.type.is_string or self.type.is_pyunicode_ptr:
-if self.arg.is_ephemeral():
+# FIXME: improve IndexNode.is_ephemeral() to avoid this special casing
+if self.arg.is_ephemeral() or (self.arg.type.is_pyobject and self.arg.is_subscript):
 error(arg.pos,
   "Obtaining '%s' from temporary Python value" % result_type)
 elif self.arg.is_name and self.arg.entry and self.arg.entry.is_pyglobal:
diff -r 6002d2e87f5c tests/errors/charptr_from_temp.pyx
--- a/tests/errors/charptr_from_temp.pyx	Sat May 31 13:15:23 2014 +0200
+++ b/tests/errors/charptr_from_temp.pyx	Sat May 31 14:26:10 2014 +0200
@@ -18,6 +18,13 @@
 # temp => error
 cptr = s + b"cba"
 
+# indexing => error
+cptr = s[0]
+cdef char* x = s[0]
+
+# slicing => error
+cptr = s[:2]
+
 
 cdef unicode  c_u = u"abc"
 u = u"abc"
@@ -40,6 +47,9 @@
 _ERRORS = """
 16:8: Obtaining 'char *' from externally modifiable global Python value
 19:9: Obtaining 'char *' from temporary Python value
-34:9: Obtaining 'Py_UNICODE *' from externally modifiable global Python value
-37:10: Obtaining 'Py_UNICODE *' from temporary Python value
+22:8: Obtaining 'char *' from temporary Python value
+23:23: Obtaining 'char *' from temporary Python value
+26:8: Obtaining 'char *' from temporary Python value
+41:9: Obtaining 'Py_UNICODE *' from externally modifiable global Python value
+44:10: Obtaining 'Py_UNICODE *' from temporary Python value
 """
___
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Wrong order of __Pyx_DECREF when calling a function with an implicit str → char* conversion.

2014-05-31 Thread Greg Ewing

Stefan Behnel wrote:

Stefan Behnel, 30.05.2014 15:01:


Emmanuel Gil Peyrot, 28.05.2014 13:25:


I was testing my cython codebase on top of pypy/cpyext, and I found a
memory corruption.  After investigating it with the pypy guys (thanks
arigato!), we identified it as a cython bug:

cdef void test(char *string):
print(string)

def run(array):
test(array[0])


I wouldn't call it a bug, more a limitation. When casting a
Python object to char *, it's the programmer's responsibility
to ensure that a reference is kept to the underlying object
for as long as necessary.

Having said that, keeping the reference for the duration of
the call would result in less surprising behaviour in cases
like this.


Cython has a mechanism to reject this kind of code, not sure why it
wouldn't strike here.


Not sure about Cython, but Pyrex only catches the most glaring
cases of this, such as adding two string and then immediately
casting the result to char *, which is almost guaranteed to
fail. Trying to catch anything more would have resulted in huge
numbers of false positives and been very annoying, so I
didn't try.

--
Greg
___
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel