Dag Sverre Seljebotn, 03.01.2011 19:10: > On 01/03/2011 06:35 PM, Robert Bradshaw wrote: >> On Mon, Jan 3, 2011 at 4:01 AM, Lisandro Dalcin<[email protected]> wrote: >> >>> On 3 January 2011 04:41, Stefan Behnel<[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I've been working on a fix for ticket #602, negative indexing for inferred >>>> char*. >>>> >>>> http://trac.cython.org/cython_trac/ticket/602 >>>> >>>> Currently, when you write this: >>>> >>>> s = b'abc' >>>> >>>> s is inferred as char*. This has several drawbacks. For one, we loose the >>>> length information, so "len(s)" becomes O(n) instead of O(1). Negative >>>> indexing fails completely because it will use pointer arithmetic, thus >>>> leaving the allocated memory area of the string. Also, code like the >>>> following is extremely inefficient because it requires multiple conversions >>>> from a char* of unknown length to a Python bytes object: >>>> >>>> s = b'abc' >>>> a = s1 + s >>>> b = s2 + s >>>> >>>> I came to the conclusion that the right fix is to stop letting byte string >>>> literals start off as char*. This immediately fixes these issues and >>>> improves Python compatibility while still allowing automatic coercion, but >>>> it also comes with its own drawbacks. >>>> >>>> In nogil blocks, you will have to explicitly declare a variable as char* >>>> when assigning a byte string literal to it, otherwise you'd get a compile >>>> time error for a Python object assignment. I think this is a minor issue as >>>> most users would declare their variables anyway when using nogil blocks. >>>> Given that there isn't much you can do with a Python string inside of a >>>> nogil block, we could also honour nogil blocks during type inference and >>>> automatically infer char* for literals here. I don't think it would hurt >>>> anyone to do that. >>>> >>>> The second drawback is that it impacts type inference for char loops. >>>> Previously, you could write >>>> >>>> s = b'abc' >>>> for c in s: >>>> print c >>>> >>>> and Cython would infer 'char' for c and print integer byte values. When s >>>> is inferred as 'bytes', c will be inferred as 'Python object' because >>>> Python 2 returns 1-byte strings and Python 3 returns integers on iteration. >>>> Thus the loop will run entirely in Python code and return different things >>>> in Py2 and Py3. >>>> >>>> I do not expect that this is a major issue either. Iteration over literals >>>> should be rare, after all, and if the byte string is constructed in any >>>> way, the type either becomes a bytes object through Python operations (like >>>> concatenation) or is explicitly provided, e.g. as a return type of a >>>> function call. But it is a clear behavioural change for the type inference >>>> in an area where Cython's (and Python's) semantics are tricky anyway. >>>> >>>> Personally, I think that the advantages outweigh the disadvantages here. >>>> Most common use cases won't notice the change because coercion will not be >>>> impacted, and most existing code (IMHO) either uses explicit typing or >>>> expects a Python bytes object anyway. So my preferred change would be to >>>> make byte string literals 'bytes' by default, except in nogil blocks. >>>> >>> +1 >>> >> +1 I might say it should even be required in nogil blocks for consistency. > > +1 to not making nogil blocks a special case, the disadvantage of > another special case to remember outweighs the advantage of syntactic > brevity IMO.
Ok, then it's the proposed change without a special case for nogil. That's better anyway because I just noticed that type inference doesn't know about nogil environments. They are not determined before the subsequent type analysis step. https://github.com/cython/cython/commit/342eb45a2fd19869273ec038144c71ac6e49db0e Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
