Re: [Cython] Fixing #602 - type inference for byte string literals

Stefan Behnel Mon, 03 Jan 2011 23:45:53 -0800

Dag Sverre Seljebotn, 03.01.2011 19:10:
> On 01/03/2011 06:35 PM, Robert Bradshaw wrote:
>> On Mon, Jan 3, 2011 at 4:01 AM, Lisandro Dalcin<[email protected]>   wrote:
>>
>>> On 3 January 2011 04:41, Stefan Behnel<[email protected]>   wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been working on a fix for ticket #602, negative indexing for inferred
>>>> char*.
>>>>
>>>> http://trac.cython.org/cython_trac/ticket/602
>>>>
>>>> Currently, when you write this:
>>>>
>>>>       s = b'abc'
>>>>
>>>> s is inferred as char*. This has several drawbacks. For one, we loose the
>>>> length information, so "len(s)" becomes O(n) instead of O(1). Negative
>>>> indexing fails completely because it will use pointer arithmetic, thus
>>>> leaving the allocated memory area of the string. Also, code like the
>>>> following is extremely inefficient because it requires multiple conversions
>>>> from a char* of unknown length to a Python bytes object:
>>>>
>>>>       s = b'abc'
>>>>       a = s1 + s
>>>>       b = s2 + s
>>>>
>>>> I came to the conclusion that the right fix is to stop letting byte string
>>>> literals start off as char*. This immediately fixes these issues and
>>>> improves Python compatibility while still allowing automatic coercion, but
>>>> it also comes with its own drawbacks.
>>>>
>>>> In nogil blocks, you will have to explicitly declare a variable as char*
>>>> when assigning a byte string literal to it, otherwise you'd get a compile
>>>> time error for a Python object assignment. I think this is a minor issue as
>>>> most users would declare their variables anyway when using nogil blocks.
>>>> Given that there isn't much you can do with a Python string inside of a
>>>> nogil block, we could also honour nogil blocks during type inference and
>>>> automatically infer char* for literals here. I don't think it would hurt
>>>> anyone to do that.
>>>>
>>>> The second drawback is that it impacts type inference for char loops.
>>>> Previously, you could write
>>>>
>>>>       s = b'abc'
>>>>       for c in s:
>>>>           print c
>>>>
>>>> and Cython would infer 'char' for c and print integer byte values. When s
>>>> is inferred as 'bytes', c will be inferred as 'Python object' because
>>>> Python 2 returns 1-byte strings and Python 3 returns integers on iteration.
>>>> Thus the loop will run entirely in Python code and return different things
>>>> in Py2 and Py3.
>>>>
>>>> I do not expect that this is a major issue either. Iteration over literals
>>>> should be rare, after all, and if the byte string is constructed in any
>>>> way, the type either becomes a bytes object through Python operations (like
>>>> concatenation) or is explicitly provided, e.g. as a return type of a
>>>> function call. But it is a clear behavioural change for the type inference
>>>> in an area where Cython's (and Python's) semantics are tricky anyway.
>>>>
>>>> Personally, I think that the advantages outweigh the disadvantages here.
>>>> Most common use cases won't notice the change because coercion will not be
>>>> impacted, and most existing code (IMHO) either uses explicit typing or
>>>> expects a Python bytes object anyway. So my preferred change would be to
>>>> make byte string literals 'bytes' by default, except in nogil blocks.
>>>>
>>> +1
>>>
>> +1 I might say it should even be required in nogil blocks for consistency.
>
> +1 to not making nogil blocks a special case, the disadvantage of
> another special case to remember outweighs the advantage of syntactic
> brevity IMO.


Ok, then it's the proposed change without a special case for nogil. That's 
better anyway because I just noticed that type inference doesn't know about 
nogil environments. They are not determined before the subsequent type 
analysis step.

https://github.com/cython/cython/commit/342eb45a2fd19869273ec038144c71ac6e49db0e

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Fixing #602 - type inference for byte string literals

Reply via email to