After reading through the code and the comments in this thread, I propose the following in the documentation as the definition of Py_UNICODE:
"This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptions about the size or native encoding of this type on any given platform." The main point here is that extension developers can not safely slam Py_UNICODE (which it appeared was true when the documentation stated that it was always 16-bits). I don't propose that we put this information in the doc, but the possible internal representations are: 2-byte wchar_t or unsigned short encoded as UTF-16 4-byte wchar_t encoded as UTF-32 (UCS-4) If you do not explicitly set the configure option, you cannot guarantee which you will get. Python also does not normalize the byte order of unicode strings passed into it from C (via PyUnicode_EncodeUTF16, for example), so it is possible to have UTF-16LE and UTF-16BE strings in the system at the same time, which is a bit confusing. This may or may not be worth a mention in the doc (or a patch). -- Nick _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com