Nicholas Bastin wrote: >> Changing the documentation that goes along with the option >> would be fine. > > > That is exactly what I proposed originally, which you shot down. Please > actually read the contents of my messages. What I said was "change the > configure option and related documentation".
What I mean is "change just the documentation, do not change the configure option". This seems to be different from your proposal, which I understand as "change both the configure option and the documentation". > Wow, what an inane way of looking at it. I don't know what world you > live in, but in my world, users read the configure options and suppose > that they mean something. In fact, they *have* to go off on their own > to assume something, because even the documentation you refer to above > doesn't say what happens if they choose UCS-2 or UCS-4. A logical > assumption would be that python would use those CEFs internally, and > that would be incorrect. Certainly. That's why the documentation should be improved. Changing the option breaks existing packaging systems, and should not be done lightly. > The current implementation supports the UTF-16 CEF. i.e., it supports a > variable width encoding form capable of representing all of the unicode > space using surrogate pairs. Please point out a code point that the > current 2 byte implementation does not support, either directly, or > through the use of surrogate pairs. Try to match regular expression classes for non-BMP characters: >>> re.match(u"[\u1234]",u"\u1234").group() u'\u1234' works fine, but >>> re.match(u"[\U00011234]",u"\U00011234").group() u'\ud804' gives strange results. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com