Cameron Simpson writes: > On 29Apr2009 22:14, Stephen J. Turnbull <step...@xemacs.org> wrote: > | Baptiste Carvello writes: > | > By contrast, if the new utf-8b codec would *supercede* the old one, > | > \udcxx would always mean raw bytes (at least on UCS-4 builds, where > | > surrogates are unused). Thus ambiguity could be avoided. > | > | Unfortunately, that's false. [Because Python strings are > | intended to be used as containers for widechars which are to be > | interpreted as Unicode when that makes sense, but there's no > | restriction against nonsense code points, including in UCS-4 > | Python.]
[...] > Wouldn't you then be bypassing the implicit encoding anyway, at least to > some extent, and thus not trip over the PEP? Sure. I'm not really arguing the PEP here; the point is that under the current definition of Python strings, ambiguity is unavoidable. The best we can ask for is fewer exceptions, and an attempt to reduce ambiguity to a bare minimum in the code paths that we open up when we make definition that allows a formerly erroneous computation to succeed. Martin is well aware of this, the PEP is clear enough about that (to me, but I'm a mail and multilingual editor internals kinda guy<wink>). I'd rather have more validation of strings, but *shrug* Martin's doing the work. OTOH, the Unicode fans need to understand that past policy of Python is not to validate; Python is intended to provide all the tools needed to write validating apps, but it isn't one itself. Martin's PEP is quite narrow in that sense. All it is about is an invertible encoding of broken encodings. It does have the downside that it guarantees that Python itself can produce non-conforming strings, but that's not the end of the world, and an app can keep track of them or even refuse them by setting the error handler, if it wants to. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com