Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Nick Coghlan writes: > GvR writes: > > Let's just define a Unicode string to be a sequence of code points and > > let libraries deal with the rest. Ok, methods like lower() should > > consider characters, but indexing/slicing should refer to code points. > > Same for '=='; we can have a libra

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Glenn Linderman
On 8/24/2011 7:29 PM, Guido van Rossum wrote: (Hey, I feel a QOTW coming. "Standards? We don't need no stinkin' standards."http://en.wikipedia.org/wiki/Stinking_badges :-) Which deserves an appropriate, follow-on, misquote: Guido says the Unicode standard stinks. ˚͜˚ <- and a Unicode smiley

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Stefan Behnel
"Martin v. Löwis", 24.08.2011 20:15: Guido has agreed to eventually pronounce on PEP 393. Before that can happen, I'd like to collect feedback on it. There have been a number of voice supporting the PEP in principle Absolutely. - conditions you would like to pose on the implementation before

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Stefan Behnel
Victor Stinner, 25.08.2011 00:29: With this PEP, the unicode object overhead grows to 10 pointer-sized words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. Does it have any adverse effects? For pure ASCII, it might be possible to use a shorter struct: typedef struct { PyO

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Guido van Rossum writes: > On Wed, Aug 24, 2011 at 5:31 PM, Stephen J. Turnbull > wrote: > >    Strings contain Unicode code units, which for most purposes can be > >    treated as Unicode characters.  However, even as "simple" an > >    operation as "s1[0] == s2[0]" cannot be relied upon t

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Nick Coghlan
On Thu, Aug 25, 2011 at 1:11 PM, Guido van Rossum wrote: >> With narrow builds, code units can currently come into play >> internally, but with PEP 393 everything internal will be working >> directly with code points. Normalisation, combining characters and >> bidi issues may still affect the corr

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Guido van Rossum
On Wed, Aug 24, 2011 at 7:47 PM, Nick Coghlan wrote: > On Thu, Aug 25, 2011 at 12:29 PM, Guido van Rossum wrote: >> Now I am happy to admit that for many Unicode issues the level at >> which we have currently defined things (code units, I think -- the >> thingies that encodings are made of) is co

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Nick Coghlan
On Thu, Aug 25, 2011 at 12:29 PM, Guido van Rossum wrote: > Now I am happy to admit that for many Unicode issues the level at > which we have currently defined things (code units, I think -- the > thingies that encodings are made of) is confusing, and it would be > better to switch to the others (

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Guido van Rossum
On Wed, Aug 24, 2011 at 5:36 PM, Stephen J. Turnbull wrote: > Guido van Rossum writes: > >  > I see nothing wrong with having the language's fundamental data types >  > (i.e., the unicode object, and even the re module) to be defined in >  > terms of codepoints, not characters, and I see nothing w

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Guido van Rossum
On Wed, Aug 24, 2011 at 5:31 PM, Stephen J. Turnbull wrote: > Terry Reedy writes: > >  > Please suggest a re-wording then, as it is a bug for doc and behavior to >  > disagree. > >    Strings contain Unicode code units, which for most purposes can be >    treated as Unicode characters.  However, e

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Guido van Rossum writes: > I see nothing wrong with having the language's fundamental data types > (i.e., the unicode object, and even the re module) to be defined in > terms of codepoints, not characters, and I see nothing wrong with > len() returning the number of codepoints (as long as it i

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Terry Reedy writes: > Please suggest a re-wording then, as it is a bug for doc and behavior to > disagree. Strings contain Unicode code units, which for most purposes can be treated as Unicode characters. However, even as "simple" an operation as "s1[0] == s2[0]" cannot be relied

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Antoine Pitrou writes: > Le jeudi 25 août 2011 à 02:15 +0900, Stephen J. Turnbull a écrit : > > Antoine Pitrou writes: > > > On Thu, 25 Aug 2011 01:34:17 +0900 > > > "Stephen J. Turnbull" wrote: > > > > > > > > Martin has long claimed that the fact that I/O is done in terms of > > > >

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Guido van Rossum
On Wed, Aug 24, 2011 at 3:29 PM, Glenn Linderman wrote: > It would seem helpful if the stdlib could have some support for efficient > handling of Unicode characters in some representation.  It would help > address the class of applications that does care. I claim that we have insufficient underst

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Tim Delaney
On 25 August 2011 07:10, Victor Stinner wrote: > > I used stringbench and "./python -m test test_unicode". I plan to try > iobench. > > Which other benchmark tool should be used? Should we write a new one? I think that the PyPy benchmarks (or at least selected tests such as slowspitfire) would p

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Glenn Linderman
On 8/24/2011 12:34 PM, Guido van Rossum wrote: On Wed, Aug 24, 2011 at 11:52 AM, Glenn Linderman wrote: On 8/24/2011 9:00 AM, Stefan Behnel wrote: Nick Coghlan, 24.08.2011 15:06: On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote: In utf16.py, attached to http://bugs.python.org/issue12729 I

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Victor Stinner
> With this PEP, the unicode object overhead grows to 10 pointer-sized > words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. > Does it have any adverse effects? For pure ASCII, it might be possible to use a shorter struct: typedef struct { PyObject_HEAD Py_ssize_t length

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Martin v. Löwis
> For Windows users, I believe it will nearly double the memory footprint > if there are any non-BMP chars. On my new machine, I should not mind > that in exchange for correct behavior. In addition, strings with non-BMP chars are much more rare than strings with all Latin-1, for which memory usage

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le mercredi 24 août 2011 20:52:51, Glenn Linderman a écrit : > Given the required variability of character size in all presently > Unicode defined encodings, I tend to agree with Tom that UTF-8, together > with some technique of translating character index to code unit offset, > may provide the bes

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Ethan Furman
Terry Reedy wrote: PEP-393 provides support of the full Unicode charset (U+-U+10) an all platforms with a small memory footprint and only O(1) functions. For Windows users, I believe it will nearly double the memory footprint if there are any non-BMP chars. On my new machine, I should

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Terry Reedy
On 8/24/2011 1:45 PM, Victor Stinner wrote: Le 24/08/2011 02:46, Terry Reedy a écrit : I don't think that using UTF-16 with surrogate pairs is really a big problem. A lot of work has been done to hide this. For example, repr(chr(0x10)) now displays '\U0010' instead of two characters. E

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Ned Deily
In article <20110824205047.6be49...@pitrou.net>, Antoine Pitrou wrote: > On Wed, 24 Aug 2011 11:37:20 -0700 > Ned Deily wrote: > > In article <20110824184927.2697b...@pitrou.net>, > > Antoine Pitrou wrote: > > > On Wed, 24 Aug 2011 15:31:50 +0200 > > > Charles-François Natali wrote: > > > > >

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Ned Deily
In article , Charles-Francois Natali wrote: > > But Snow Leopard, where these failures occur, is OS X 10.6. > > *sighs* > It still looks like a kernel/libc bug to me: AFAICT, both the code and > the tests are correct. > And apparently, there are still issues pertaining to FD passing on > 10.5 (

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Terry Reedy
On 8/24/2011 12:34 PM, Stephen J. Turnbull wrote: Terry Reedy writes: > Excuse me for believing the fine 3.2 manual that says > "Strings contain Unicode characters." The manual is wrong, then, subject to a pronouncement to the contrary, Please suggest a re-wording then, as it is a bug f

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Guido van Rossum
On Wed, Aug 24, 2011 at 11:52 AM, Glenn Linderman wrote: > On 8/24/2011 9:00 AM, Stefan Behnel wrote: > > Nick Coghlan, 24.08.2011 15:06: > > On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote: > > In utf16.py, attached to http://bugs.python.org/issue12729 > I propose for consideration a prototyp

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Charles-François Natali
> But Snow Leopard, where these failures occur, is OS X 10.6. *sighs* It still looks like a kernel/libc bug to me: AFAICT, both the code and the tests are correct. And apparently, there are still issues pertaining to FD passing on 10.5 (and maybe later, I couldn't find a public access to their bug

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Glenn Linderman
On 8/24/2011 9:00 AM, Stefan Behnel wrote: Nick Coghlan, 24.08.2011 15:06: On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote: In utf16.py, attached to http://bugs.python.org/issue12729 I propose for consideration a prototype of different solution to the 'mostly BMP chars, few non-BMP chars'

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Antoine Pitrou
On Wed, 24 Aug 2011 11:37:20 -0700 Ned Deily wrote: > In article <20110824184927.2697b...@pitrou.net>, > Antoine Pitrou wrote: > > On Wed, 24 Aug 2011 15:31:50 +0200 > > Charles-François Natali wrote: > > > > The buildbots are complaining about some of tests for the new > > > > socket.sendmsg/

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Terry Reedy
On 8/24/2011 1:50 PM, "Martin v. Löwis" wrote: I'd like to point out that the improved compatibility is only a side effect, not the primary objective of the PEP. Then why does the Rationale start with "on systems only supporting UTF-16, users complain that non-BMP characters are not properly

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Ned Deily
In article <20110824184927.2697b...@pitrou.net>, Antoine Pitrou wrote: > On Wed, 24 Aug 2011 15:31:50 +0200 > Charles-François Natali wrote: > > > The buildbots are complaining about some of tests for the new > > > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that > > > provide

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Antoine Pitrou
On Wed, 24 Aug 2011 20:15:24 +0200 "Martin v. Löwis" wrote: > - issues to be considered (unclarities, bugs, limitations, ...) With this PEP, the unicode object overhead grows to 10 pointer-sized words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. Does it have any adverse effects

[Python-Dev] PEP 393 review

2011-08-24 Thread Martin v. Löwis
Guido has agreed to eventually pronounce on PEP 393. Before that can happen, I'd like to collect feedback on it. There have been a number of voice supporting the PEP in principle, so I'm now interested in comments in the following areas: - principle objection. I'll list them in the PEP. - issues t

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 11:22, Glenn Linderman a écrit : c) mostly ASCII (utf8) with clever indexing/caching to be efficient d) UTF-8 with clever indexing/caching to be efficient I see neither a need nor a means to consider these. The discussion about "mostly ASCII" strings seems convincing that there c

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Martin v. Löwis
>> Eg, display of characters in the interpreter. > > I don't know why you say it's "done in terms of UTF-16", then. Unicode > strings are simply encoded to whatever character set is detected as the > terminal's character set. I think what he means (and what I meant when I said something similar):

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Martin v. Löwis
> > PEP 393 abolishes narrow builds as we now know them and changes > > semantics. I was answering a complaint about that change. If you do > > not like the PEP, fine. > > No, I do like the PEP. However, it is only a step, a rather > conservative one in some ways, toward conformance to the Uni

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 02:46, Terry Reedy a écrit : On 8/23/2011 9:21 AM, Victor Stinner wrote: Le 23/08/2011 15:06, "Martin v. Löwis" a écrit : Well, things have to be done in order: 1. the PEP needs to be approved 2. the performance bottlenecks need to be identified 3. optimizations should be applied.

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Antoine Pitrou
Le jeudi 25 août 2011 à 02:15 +0900, Stephen J. Turnbull a écrit : > Antoine Pitrou writes: > > On Thu, 25 Aug 2011 01:34:17 +0900 > > "Stephen J. Turnbull" wrote: > > > > > > Martin has long claimed that the fact that I/O is done in terms of > > > UTF-16 means that the internal representati

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Antoine Pitrou writes: > On Thu, 25 Aug 2011 01:34:17 +0900 > "Stephen J. Turnbull" wrote: > > > > Martin has long claimed that the fact that I/O is done in terms of > > UTF-16 means that the internal representation is UTF-16 > > Which I/O? Eg, display of characters in the interpreter. _

Re: [Python-Dev] FileSystemError or FilesystemError?

2011-08-24 Thread Vlad Riscutia
+1 for FileSystemError. I see myself misspelling it as FileSystemError if we go with alternate spelling. I'll probably won't be the only one. Thank you, Vlad On Wed, Aug 24, 2011 at 4:09 AM, Eli Bendersky wrote: > > When reviewing the PEP 3151 implementation (*), Ezio commented that >> "FileSys

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Antoine Pitrou
On Wed, 24 Aug 2011 15:31:50 +0200 Charles-François Natali wrote: > > The buildbots are complaining about some of tests for the new > > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that > > provide CMSG_LEN. > > Looks like kernel bugs: > http://developer.apple.com/library/mac/#q

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Antoine Pitrou
On Thu, 25 Aug 2011 01:34:17 +0900 "Stephen J. Turnbull" wrote: > > Martin has long claimed that the fact that I/O is done in terms of > UTF-16 means that the internal representation is UTF-16 Which I/O? ___ Python-Dev mailing list Python-Dev@python

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Terry Reedy writes: > Excuse me for believing the fine 3.2 manual that says > "Strings contain Unicode characters." The manual is wrong, then, subject to a pronouncement to the contrary, of course. I was on your side of the fence when this was discussed, pre-release. I was wrong then. My bet

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stefan Behnel
Nick Coghlan, 24.08.2011 15:06: On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote: In utf16.py, attached to http://bugs.python.org/issue12729 I propose for consideration a prototype of different solution to the 'mostly BMP chars, few non-BMP chars' case. Rather than expand every character from

Re: [Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Charles-François Natali
> The buildbots are complaining about some of tests for the new > socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that > provide CMSG_LEN. Looks like kernel bugs: http://developer.apple.com/library/mac/#qa/qa1541/_index.html """ Yes. Mac OS X 10.5 fixes a number of kernel bugs rela

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Nick Coghlan
On Wed, Aug 24, 2011 at 10:46 AM, Terry Reedy wrote: > In utf16.py, attached to http://bugs.python.org/issue12729 > I propose for consideration a prototype of different solution to the 'mostly > BMP chars, few non-BMP chars' case. Rather than expand every character from > 2 bytes to 4, attach an a

[Python-Dev] sendmsg/recvmsg on Mac OS X

2011-08-24 Thread Nick Coghlan
The buildbots are complaining about some of tests for the new socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that provide CMSG_LEN. http://www.python.org/dev/buildbot/all/builders/AMD64%20Snow%20Leopard%202%203.x/builds/831/steps/test/logs/stdio Before I start trying to figure thi

Re: [Python-Dev] FileSystemError or FilesystemError?

2011-08-24 Thread Eli Bendersky
> When reviewing the PEP 3151 implementation (*), Ezio commented that > "FileSystemError" looks a bit strange and that "FilesystemError" would > be a better spelling. What is your opinion? > > (*) http://bugs.python.org/issue12555 > +1 for FileSystemError Eli _

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Martin v. Löwis
>> I think the value for wstr/uninitialized/reserved should not be >> removed. The wstr representation is still used in the error case in >> the utf8 decoder because these strings can be resized. > > In Python, you can resize an object if it has only one reference. Why is > it not possible in you

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Terry Reedy
On 8/24/2011 4:22 AM, Stephen J. Turnbull wrote: Terry Reedy writes: > The current UCS2 Unicode string implementation, by design, quickly gives > WRONG answers for len(), iteration, indexing, and slicing if a string > contains any non-BMP (surrogate pair) Unicode characters. That may ha

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Martin v. Löwis
Am 24.08.2011 10:17, schrieb Victor Stinner: > Le 24/08/2011 04:41, Torsten Becker a écrit : >> On Tue, Aug 23, 2011 at 18:27, Victor Stinner >> wrote: >>> I posted a patch to re-add it: >>> http://bugs.python.org/issue12819#msg142867 >> >> Thank you for the patch! Note that this patch adds the

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Scott Dial
On 8/24/2011 4:11 AM, Victor Stinner wrote: > Le 24/08/2011 06:59, Scott Dial a écrit : >> On 8/23/2011 6:38 PM, Victor Stinner wrote: >>> Le mardi 23 août 2011 00:14:40, Antoine Pitrou a écrit : - You could try to run stringbench, which can be found at http://svn.python.org/projects/s

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Glenn Linderman
On 8/24/2011 1:18 AM, "Martin v. Löwis" wrote: So am I correctly reading between the lines when, after reading this thread so far, and the complete issue discussion so far, that I see a PEP 393 revision or replacement that has the following characteristics: 1) Narrow builds are dropped. PEP 393

Re: [Python-Dev] FileSystemError or FilesystemError?

2011-08-24 Thread Cameron Simpson
On 24Aug2011 12:31, Nick Coghlan wrote: | On Wed, Aug 24, 2011 at 5:19 AM, Steven D'Aprano wrote: | > Antoine Pitrou wrote: | >> When reviewing the PEP 3151 implementation (*), Ezio commented that | >> "FileSystemError" looks a bit strange and that "FilesystemError" would | >> be a better spellin

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 04:56, Torsten Becker a écrit : On Tue, Aug 23, 2011 at 18:56, Victor Stinner wrote: kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape(). If it can be removed, it would be nice to have kind in

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stephen J. Turnbull
Terry Reedy writes: > The current UCS2 Unicode string implementation, by design, quickly gives > WRONG answers for len(), iteration, indexing, and slicing if a string > contains any non-BMP (surrogate pair) Unicode characters. That may have > been excusable when there essentially were no su

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Martin v. Löwis
> So am I correctly reading between the lines when, after reading this > thread so far, and the complete issue discussion so far, that I see a > PEP 393 revision or replacement that has the following characteristics: > > 1) Narrow builds are dropped. PEP 393 already drops narrow builds. > 2) The

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 04:41, Torsten Becker a écrit : On Tue, Aug 23, 2011 at 18:27, Victor Stinner wrote: I posted a patch to re-add it: http://bugs.python.org/issue12819#msg142867 Thank you for the patch! Note that this patch adds the fast path only to the helper function which determines the len

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 06:59, Scott Dial a écrit : On 8/23/2011 6:38 PM, Victor Stinner wrote: Le mardi 23 août 2011 00:14:40, Antoine Pitrou a écrit : - You could try to run stringbench, which can be found at http://svn.python.org/projects/sandbox/trunk/stringbench (*) and there's iobench (the te

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 04:41, Torsten Becker a écrit : On Tue, Aug 23, 2011 at 10:08, Antoine Pitrou wrote: Macros are useful to shield the abstraction from the implementation. If you access the members directly, and the unicode object is represented differently in some future version of Python (say e.g

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Glenn Linderman
On 8/23/2011 5:46 PM, Terry Reedy wrote: On 8/23/2011 6:20 AM, "Martin v. Löwis" wrote: Am 23.08.2011 11:46, schrieb Xavier Morel: Mostly ascii is pretty common for western-european languages (French, for instance, is probably 90 to 95% ascii). It's also a risk in english, when the writer "co

Re: [Python-Dev] FileSystemError or FilesystemError?

2011-08-24 Thread Stephen J. Turnbull
Nick Coghlan writes: > Since I tend to use the one word 'filesystem' form myself (ditto for > 'filename'), I'm +1 for FilesystemError, but I'm only -0 for > FileSystemError (so I expect that will be the option chosen, given > other responses). I slightly prefer FilesystemError because it pars

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Stefan Behnel
Torsten Becker, 24.08.2011 04:41: Also, common, now simple, checks for "unicode->str == NULL" would look more ambiguous with a union ("unicode->str.latin1 == NULL"). You could just add yet another field "any", i.e. union { unsigned char* latin1; Py_UCS2* ucs2; Py_UCS4*