Re: [Python-Dev] New Py_UNICODE doc

Nicholas Bastin Fri, 06 May 2005 11:49:28 -0700

On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote:

> You've got that wrong: Python let's you choose UCS-4 -
> UCS-2 is the default.
>
> Note that Python's Unicode codecs UTF-8 and UTF-16
> are surrogate aware and thus support non-BMP code points
> regardless of the build type: A UCS2-build of Python will
> store a non-BMP code point as UTF-16 surrogate pair in the
> Py_UNICODE buffer while a UCS4 build will store it as a
> single value. Decoding is surrogate aware too, so a UTF-16
> surrogate pair in a UCS2 build will get treated as single
> Unicode code point.


If this is the case, then we're clearly misleading users.  If the 
configure script says UCS-2, then as a user I would assume that 
surrogate pairs would *not* be encoded, because I chose UCS-2, and it 
doesn't support that.  I would assume that any UTF-16 string I would 
read would be transcoded into the internal type (UCS-2), and 
information would be lost.  If this is not the case, then what does the 
configure option mean?

--
Nick

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] New Py_UNICODE doc

Reply via email to