Re: [Python-Dev] PEP 393 Summer of Code Project

Terry Reedy Tue, 23 Aug 2011 17:56:04 -0700

On 8/23/2011 9:21 AM, Victor Stinner wrote:

Le 23/08/2011 15:06, "Martin v. Löwis" a écrit :

Well, things have to be done in order:
1. the PEP needs to be approved
2. the performance bottlenecks need to be identified
3. optimizations should be applied.


I would not vote for the PEP if it slows down Python, especially if it's
much slower. But Torsten says that it speeds up Python, which is
surprising. I have to do my own benchmarks :-)

The current UCS2 Unicode string implementation, by design, quickly givesWRONG answers for len(), iteration, indexing, and slicing if a stringcontains any non-BMP (surrogate pair) Unicode characters. That may havebeen excusable when there essentially were no such extended chars, andthe few there were were almost never used. But now there are many more,with more being added to each Unicode edition. They include cursive Mathletters that are used in English documents today. The problem willslowly get worse and Python, at least on Windows, will become a languageto avoid for dependable Unicode document processing. 3.x needs a properUnicode implementation that works for all strings on all builds.


utf16.py, attached to http://bugs.python.org/issue12729

prototypes a different solution than the PEP for the above problems forthe 'mostly BMP' case. I will discuss it in a different post.


--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 Summer of Code Project

Reply via email to