On 6/21/2010 1:29 PM, Guido van Rossum wrote:
Actually, the big problem with Python 2 is that if you mix str and
unicode, things work or crash depending on whether any of the str
objects involved contain non-ASCII bytes.
If one API decides to upgrade to Unicode, the result, when passed to
another API, may well cause a UnicodeError because not all arguments
have had the same treatment.
Now, the APIs are neither safe nor aware -- if you pass bytes in, you get
unpredictable results back.
This seems an overgeneralization of a particular bug. There are APIs
that are strictly text-in, text-out. There are others that are
bytes-in, bytes-out. Let's call all those *pure*. For some operations
it makes sense that the API is *polymorphic*, with which I mean that
text-in causes text-out, and bytes-in causes byte-out. All of these
are fine.
Perhaps there are more situations where a polymorphic API would be
helpful. Such APIs are not always so easy to implement, because they
have to be careful with literals or other constants (and even more so
mutable state) used internally -- but it can be done, and there are
plenty of examples in the stdlib.
The real problem apparently lies in (what I believe is only a few
rare) APIs that are text-or-bytes-in and always-text-out (or
always-bytes-out). Let's call them *hybrid*. Clearly, mixing hybrid
APIs in a stream of pure or polymorphic API calls is a problem,
because they turn a pure or polymorphic overall operation into a
hybrid one.
There are also text-in, bytes-out or bytes-in, text-out APIs that are
intended for encoding/decoding of course, but these are in a totally
different class.
Abstractly, it would be good if there were as few as possible hybrid
APIs, many pure or polymorphic APIs (which it should be in a
particular case is a pragmatic choice), and a limited number of
encoding/decoding APIs, which should generally be invoked at the edges
of the program (e.g., I/O).
Nice summary of part of the 'why' for Python3.
I still believe that believe that the instances of bytes silently
succeeding *some* of the time refers to specific bugs in specific
APIs, either intentional because of misguided compatibility desires,
or accidental in the haste of trying to convert the entire stdlib to
Python 3 in a finite time.
I think http://bugs.python.org/issue5468 reports one aspect of haste,
missing encoding and errors paramaters. But it has not gotten much
attention.
--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com