On 8/24/2011 1:18 AM, "Martin v. Löwis" wrote:
So am I correctly reading between the lines when, after reading this
thread so far, and the complete issue discussion so far, that I see a
PEP 393 revision or replacement that has the following characteristics:

1) Narrow builds are dropped.
PEP 393 already drops narrow builds.

I'd forgotten that.


2) There are more, or different, internal kinds of strings, which affect
the processing patterns.
This is the basic idea of PEP 393.

Agreed.

a) all ASCII
b) latin-1 (8-bit codepoints, the first 256 Unicode codepoints) This
kind may not be able to support a "mostly" variation, and may be no more
efficient than case b).  But it might also be popular in parts of Europe
This two cases are already in PEP 393.
Sure.  Wanted to enumerate all, rather than just add-ons.

c) mostly ASCII (utf8) with clever indexing/caching to be efficient
d) UTF-8 with clever indexing/caching to be efficient
I see neither a need nor a means to consider these.

The discussion about "mostly ASCII" strings seems convincing that there could be a significant space savings if such were implemented.

e) 16-bit codepoints
These are in PEP 393.

f) UTF-16 with clever indexing/caching to be efficient
Again, -1.

This is probably the one I would pick as least likely to be useful if the rest were implemented.

g) 32-bit codepoints
This is in PEP 393.

h) UTF-32
What's that, as opposed to g)?

g) would permit codes greater than u+10ffff and would permit the illegal codepoints and lone surrogates. h) would be strict Unicode conformance. Sorry that the 4 paragraphs of explanation that you didn't quote didn't make that clear.

I'm not open to revise PEP 393 in the direction of adding more
representations.

It's your PEP.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to