Robert Bradshaw, 14.02.2013 06:51: > I've proposed having a compiler > directive that lets you specify an encoding (e.g. ascii, utf8) and > automatically endodes/decodes when converting between C and Python > strings.
My main objection against that is that it would only work in one direction, from C strings to Python strings. The other direction requires an explicit intermediate bytes object in order to correctly do the memory management, so there's really nothing to win there. Doing anything implicit in that direction would just call for either trouble or inefficiency. For the first direction, C-to-Python, I don't see the major advantage between the implicit cdef unicode py_string = c_string # typing required here and the explicit py_string = c_string.decode('utf-8') # note: no typing here There is only one case where it's a bit simpler: py_string = c_string[:length] # no typing, auto-coercion in contrast to py_string = c_string[:length].decode('utf-8') Anyway, it's just a couple of characters difference, which are best hidden in an explicit "conversion + validation" function anyway. Auto-coercion of C strings will always be more inefficient and error prone than users should be asked to bare, and all we could add would only be the unidirectional conversion part, not any validation or whatever user code has to do in addition. The situation is entirely different for C++ strings. They have an efficient two-way auto-coercion and safely copy their content on creation. In their case, auto-coercion would basically behave like from __future__ import unicode_literals but for string coercion. I have no objections against that. I think it just needs implementing and then testing against a couple of real, existing code bases to see what the real-world tradeoff is then. It's just a matter of whether a user needs to write "<unicode>" or "<bytes>" in the right places. All of that being said, the proposal sounds like it's actually two: 1) specify an implicit encoding for coercion between C++ strings and Python unicode strings, and 2) automatically coerce between C++ strings and Python unicode strings by default. 1) means that cdef libcpp.string cs1 = ..., cs2 py_string = <unicode>cs1 cs2 = py_string would auto-decode and -encode the string, 2) means that cdef libcpp.string cs1 = ..., cs2 py_string = <object>cs1 cs2 = py_string would do it (including any implicit coercions to Python objects). If 2) is desirable at all, I think it makes sense to fold that into two separate directives, as many users will be better off without the second one. There's also the question whether you want coercion to and from "unicode" or to and from "str". Getting the latter right wouldn't be easy, most likely neither for us nor for users who want to apply it to their code. However, given that the only use case for that would be Py2 backwards compatibility, waiting a couple of years longer should nicely solve this problem for us. No need to burden the compiler with it now. Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel