On 3 January 2011 04:41, Stefan Behnel <[email protected]> wrote: > Hi, > > I've been working on a fix for ticket #602, negative indexing for inferred > char*. > > http://trac.cython.org/cython_trac/ticket/602 > > Currently, when you write this: > > s = b'abc' > > s is inferred as char*. This has several drawbacks. For one, we loose the > length information, so "len(s)" becomes O(n) instead of O(1). Negative > indexing fails completely because it will use pointer arithmetic, thus > leaving the allocated memory area of the string. Also, code like the > following is extremely inefficient because it requires multiple conversions > from a char* of unknown length to a Python bytes object: > > s = b'abc' > a = s1 + s > b = s2 + s > > I came to the conclusion that the right fix is to stop letting byte string > literals start off as char*. This immediately fixes these issues and > improves Python compatibility while still allowing automatic coercion, but > it also comes with its own drawbacks. > > In nogil blocks, you will have to explicitly declare a variable as char* > when assigning a byte string literal to it, otherwise you'd get a compile > time error for a Python object assignment. I think this is a minor issue as > most users would declare their variables anyway when using nogil blocks. > Given that there isn't much you can do with a Python string inside of a > nogil block, we could also honour nogil blocks during type inference and > automatically infer char* for literals here. I don't think it would hurt > anyone to do that. > > The second drawback is that it impacts type inference for char loops. > Previously, you could write > > s = b'abc' > for c in s: > print c > > and Cython would infer 'char' for c and print integer byte values. When s > is inferred as 'bytes', c will be inferred as 'Python object' because > Python 2 returns 1-byte strings and Python 3 returns integers on iteration. > Thus the loop will run entirely in Python code and return different things > in Py2 and Py3. > > I do not expect that this is a major issue either. Iteration over literals > should be rare, after all, and if the byte string is constructed in any > way, the type either becomes a bytes object through Python operations (like > concatenation) or is explicitly provided, e.g. as a return type of a > function call. But it is a clear behavioural change for the type inference > in an area where Cython's (and Python's) semantics are tricky anyway. > > Personally, I think that the advantages outweigh the disadvantages here. > Most common use cases won't notice the change because coercion will not be > impacted, and most existing code (IMHO) either uses explicit typing or > expects a Python bytes object anyway. So my preferred change would be to > make byte string literals 'bytes' by default, except in nogil blocks. > > Opinions? >
+1 -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
