Shane Hathaway wrote:
> Ok.  Thanks for helping me understand where Python is WRT unicode.  I
> can work around the issues (or maybe try to help solve them) now that I
> know the current state of affairs.  If Python correctly handled UTF-16
> strings internally, we wouldn't need the UCS-4 configuration switch,
> would we?

Define correctly. Python, in ucs2 mode, will allow to address individual
surrogate codes, e.g. in indexing. So you get

>>> u"\U00012345"[0]
u'\ud808'

This will never work "correctly", and never should, because an efficient
implementation isn't possible. If you want "safe" indexing and slicing,
you need ucs4.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to