Shane Hathaway wrote: > Ok. Thanks for helping me understand where Python is WRT unicode. I > can work around the issues (or maybe try to help solve them) now that I > know the current state of affairs. If Python correctly handled UTF-16 > strings internally, we wouldn't need the UCS-4 configuration switch, > would we?
Define correctly. Python, in ucs2 mode, will allow to address individual surrogate codes, e.g. in indexing. So you get >>> u"\U00012345"[0] u'\ud808' This will never work "correctly", and never should, because an efficient implementation isn't possible. If you want "safe" indexing and slicing, you need ucs4. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com