Re: [Python-Dev] New Py_UNICODE doc

2005-05-11 Thread M.-A. Lemburg
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>If all you're interested in is the lexical class of the code points >>in a string, you could use such a codec to map each code point >>to a code point representing the lexical class. > > > How can I efficiently implement such a codec? The whole p

Re: [Python-Dev] New Py_UNICODE doc

2005-05-11 Thread Nicholas Bastin
On May 10, 2005, at 7:34 PM, James Y Knight wrote: > If you're going to call python's implementation UTF-16, I'd consider > all these very serious deficiencies: The --enable-unicode option declares a character encoding form (CEF), not a character encoding scheme (CES). It is unfortunate that U

Re: [Python-Dev] New Py_UNICODE doc

2005-05-10 Thread James Y Knight
On May 10, 2005, at 2:48 PM, Nicholas Bastin wrote: > On May 9, 2005, at 12:59 AM, Martin v. Löwis wrote: > > >>> Wow, what an inane way of looking at it. I don't know what world >>> you >>> live in, but in my world, users read the configure options and >>> suppose >>> that they mean somethin

Re: [Python-Dev] New Py_UNICODE doc

2005-05-10 Thread Martin v. Löwis
Nicholas Bastin wrote: > I'm perfectly happy to continue supporting --enable-unicode=ucs2, but > not displaying it as an option. Is that acceptable to you? It is. Somewhere, the code should say that this is for backwards compatibility, of course (so people won't remove it too easily; if there is

Re: [Python-Dev] New Py_UNICODE doc

2005-05-10 Thread Martin v. Löwis
M.-A. Lemburg wrote: > If all you're interested in is the lexical class of the code points > in a string, you could use such a codec to map each code point > to a code point representing the lexical class. How can I efficiently implement such a codec? The whole point is doing that in pure Python (

Re: [Python-Dev] New Py_UNICODE doc

2005-05-10 Thread Nicholas Bastin
On May 9, 2005, at 12:59 AM, Martin v. Löwis wrote: >> Wow, what an inane way of looking at it. I don't know what world you >> live in, but in my world, users read the configure options and suppose >> that they mean something. In fact, they *have* to go off on their own >> to assume something,

Re: [Python-Dev] New Py_UNICODE doc

2005-05-10 Thread M.-A. Lemburg
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>On sre character classes: I don't think that these provide >>a good approach to XML lexical classes - custom functions >>or methods or maybe even a codec mapping the characters >>to their XML lexical class are much more efficient in >>practice. >

Re: [Python-Dev] New Py_UNICODE doc

2005-05-09 Thread Martin v. Löwis
M.-A. Lemburg wrote: > On sre character classes: I don't think that these provide > a good approach to XML lexical classes - custom functions > or methods or maybe even a codec mapping the characters > to their XML lexical class are much more efficient in > practice. That isn't my experience: func

Re: [Python-Dev] New Py_UNICODE doc

2005-05-09 Thread M.-A. Lemburg
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>Unicode has many code points that are meant only for composition >>and don't have any standalone meaning, e.g. a combining acute >>accent (U+0301), yet they are perfectly valid code points - >>regardless of UCS-2 or UCS-4. It is easily possible to

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Nicholas Bastin wrote: >> Again, patches are welcome. I was opposed to Nick's proposed changes, >> since they explicitly said that you are not supposed to know what >> is in a Py_UNICODE. Integrating the essence of PEP 261 into the >> main documentation would be a worthwhile task. > > > You can't

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Nicholas Bastin wrote: > It's not always 2 bytes on Windows. Users can alter the config options > (and not unreasonably so, btw, on 64-bit windows platforms). Did you try that? I'm not sure it even builds when you do so, but if it does, you will lose the "mbcs" codec, and the ability to use Unico

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Nicholas Bastin wrote: >> Changing the documentation that goes along with the option >> would be fine. > > > That is exactly what I proposed originally, which you shot down. Please > actually read the contents of my messages. What I said was "change the > configure option and related documentat

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Nicholas Bastin
On May 8, 2005, at 1:44 PM, Martin v. Löwis wrote: > Shane Hathaway wrote: >> Fair enough. The original point is that the documentation is unclear >> about what a Py_UNICODE[] contains. I deduced that it contains either >> UCS2 or UCS4 and implemented accordingly. Not only did I guess wrong, >

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Nicholas Bastin
On May 8, 2005, at 5:28 AM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> All of my proposals for what to change the documention to have been >> shot down by Martin. If someone has better verbiage that they'd like >> to see, I'd be perfectly happy to patch the doc. > > I don't look into the

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Nicholas Bastin
On May 8, 2005, at 5:15 AM, Martin v. Löwis wrote: > 'configure takes an option --enable-unicode, with the possible > values "ucs2", "ucs4", "yes" (equivalent to no argument), > and "no" (equivalent to --disable-unicode)' > > *THIS* documentation would break. This documentation is factually > co

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Shane Hathaway wrote: > Fair enough. The original point is that the documentation is unclear > about what a Py_UNICODE[] contains. I deduced that it contains either > UCS2 or UCS4 and implemented accordingly. Not only did I guess wrong, > but others will probably guess wrong too. Something in t

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Shane Hathaway
M.-A. Lemburg wrote: > All this talk about UTF-16 vs. UCS-2 is not very useful > and strikes me a purely academic. > > The reference to possibly breakage by slicing a Unicode and > breaking a surrogate pair is valid, the idea of UCS-4 being > less prone to breakage is a myth: Fair enough. The or

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Nicholas Bastin wrote: > All of my proposals for what to change the documention to have been > shot down by Martin. If someone has better verbiage that they'd like > to see, I'd be perfectly happy to patch the doc. I don't look into the specific wording - you speak English much better than I do

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Nicholas Bastin wrote: >> -1. This breaks existing documentation and usage, and provides only >> minimum value. > > > Have you been missing this conversation? UTF-16 is *WHAT PYTHON > CURRENTLY IMPLEMENTS*. The current documentation is flat out wrong. > Breaking that isn't a big problem in my

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
Nicholas Bastin wrote: > I don't consider either alternative useless (well, I consider UCS-2 to > be largely useless in the general case, but as we've already discussed > here, Python isn't really UCS-2). However, I would be a lot happier if > we just chose *one*, and all Python's used that one.

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
M.-A. Lemburg wrote: > I believe that it would be more appropriate to adjust the _tkinter > module to adapt to the TCL Unicode size rather than > forcing the complete Python system to adapt to TCL - I don't > really see the point in an optional extension module > defining the default for the interp

Re: [Python-Dev] New Py_UNICODE doc

2005-05-08 Thread Martin v. Löwis
M.-A. Lemburg wrote: > Unicode has many code points that are meant only for composition > and don't have any standalone meaning, e.g. a combining acute > accent (U+0301), yet they are perfectly valid code points - > regardless of UCS-2 or UCS-4. It is easily possible to break > such a combining seq

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Nicholas Bastin wrote: > On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: > > >>However, I don't understand all the excitement >>about Py_UNICODE: if you don't like the way this Python >>typedef works, you are free to interface to Python using >>any of the supported encodings using PyUnicode_Enco

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin
On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: > Please upload your doc-patch to SF. All of my proposals for what to change the documention to have been shot down by Martin. If someone has better verbiage that they'd like to see, I'd be perfectly happy to patch the doc. My last suggestion

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin
On May 7, 2005, at 5:09 PM, M.-A. Lemburg wrote: > However, I don't understand all the excitement > about Py_UNICODE: if you don't like the way this Python > typedef works, you are free to interface to Python using > any of the supported encodings using PyUnicode_Encode() > and PyUnicode_Decode()

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Nicholas Bastin wrote: > On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote: >>With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start >>supporting the full Unicode ccs the same way it supports UCS-2. >>Individual surrogate values remain accessible, and supporting >>non-BMP characters is le

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin
On May 7, 2005, at 9:29 AM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> --enable-unicode=ucs2 >> >> be replaced with: >> >> --enable-unicode=utf16 >> >> and the docs be updated to reflect more accurately the variance of the >> internal storage type. > > -1. This breaks existing documentati

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Nicholas Bastin
On May 7, 2005, at 9:24 AM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Yes, but the important question here is why would we want that? Why >> doesn't Python just have *one* internal representation of a Unicode >> character? Having more than one possible definition just creates >> proble

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>Hmm, looking at the configure.in script, it seems you're right. >>I wonder why this weird dependency on TCL was added. > > > If Python is configured for UCS-2, and Tcl for UCS-4, then > Tkinter would not work out of the box. Hence the weird depen

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread M.-A. Lemburg
Shane Hathaway wrote: > Martin v. Löwis wrote: > >>Shane Hathaway wrote: >> >> >>>I agree that UCS4 is needed. There is a balancing act here; UTF-16 is >>>widely used and takes less space, while UCS4 is easier to treat as an >>>array of characters. Maybe we can have both: unicode objects start w

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Shane Hathaway wrote: > Py_UNICODE would always be 32 bits wide. This would break PythonWin, which relies on Py_UNICODE being the same as WCHAR_T. PythonWin is not broken, it just hasn't been ported to UCS-4, yet (and porting this is difficult and will cause a performance loss). Regards, Martin

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Shane Hathaway
Martin v. Löwis wrote: > Shane Hathaway wrote: > >>I agree that UCS4 is needed. There is a balancing act here; UTF-16 is >>widely used and takes less space, while UCS4 is easier to treat as an >>array of characters. Maybe we can have both: unicode objects start with >>an internal representation

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
> Yes, but the first few steps are the same for nearly everyone, and > people need more help taking the first few steps. Contributions to the documentation are certainly welcome. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://m

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Shane Hathaway wrote: > I agree that UCS4 is needed. There is a balancing act here; UTF-16 is > widely used and takes less space, while UCS4 is easier to treat as an > array of characters. Maybe we can have both: unicode objects start with > an internal representation in UTF-16, but get promoted

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Nicholas Bastin wrote: > --enable-unicode=ucs2 > > be replaced with: > > --enable-unicode=utf16 > > and the docs be updated to reflect more accurately the variance of the > internal storage type. -1. This breaks existing documentation and usage, and provides only minimum value. With --enable-u

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Martin v. Löwis
Nicholas Bastin wrote: > Yes, but the important question here is why would we want that? Why > doesn't Python just have *one* internal representation of a Unicode > character? Having more than one possible definition just creates > problems, and provides no value. It does provide value, there ar

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Shane Hathaway
Martin v. Löwis wrote: > Shane Hathaway wrote: >>More generally, how should a non-unicode-expert writing Python extension >>code find out the minimum they need to know about unicode to use the >>Python unicode API? The API reference [1] ought to at least have a list >>of background links. I had t

Re: [Python-Dev] New Py_UNICODE doc

2005-05-07 Thread Shane Hathaway
Martin v. Löwis wrote: > Define correctly. Python, in ucs2 mode, will allow to address individual > surrogate codes, e.g. in indexing. So you get > > u"\U00012345"[0] When Python encodes characters internally in UCS-2, I would expect u"\U00012345" to produce a UnicodeError("character can not

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 8:11 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Well, this is a completely separate issue/problem. The internal >> representation is UTF-16, and should be stated as such. If the >> built-in methods actually don't work with surrogate pairs, then that >> should be fi

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 8:25 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Yes. Not only in my mind, but in the Python source code. If >> Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), >> otherwise the encoding is UTF-16 (*not* UCS-2). > > I see. Some people equate "encodi

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > Well, this is a completely separate issue/problem. The internal > representation is UTF-16, and should be stated as such. If the > built-in methods actually don't work with surrogate pairs, then that > should be fixed. Yes to the former, no to the latter. PEP 261 speci

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > Yes. Not only in my mind, but in the Python source code. If > Py_UNICODE is 4 bytes wide, then the encoding is UTF-32 (UCS-4), > otherwise the encoding is UTF-16 (*not* UCS-2). I see. Some people equate "encoding" with "encoding scheme"; neither UTF-32 nor UTF-16 is an

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > What I mean is pretty clear. UCS-2 does *NOT* support surrogate pairs. > If it did, it would be called UTF-16. If Python really supported > UCS-2, then surrogate pairs from UTF-16 inputs would either get turned > into two garbage characters, or the "I couldn't transc

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Shane Hathaway wrote: > Ok. Thanks for helping me understand where Python is WRT unicode. I > can work around the issues (or maybe try to help solve them) now that I > know the current state of affairs. If Python correctly handled UTF-16 > strings internally, we wouldn't need the UCS-4 configura

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 7:45 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> Because the encoding of that buffer appears to be different depending >> on >> the configure options. > > What makes it appear so? sizeof(Py_UNICODE) changes when you change > the option - does that, in your mind, mea

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 7:43 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> If this is the case, then we're clearly misleading users. If the >> configure script says UCS-2, then as a user I would assume that >> surrogate pairs would *not* be encoded, because I chose UCS-2, and it >> doesn't s

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
M.-A. Lemburg wrote: > Hmm, looking at the configure.in script, it seems you're right. > I wonder why this weird dependency on TCL was added. If Python is configured for UCS-2, and Tcl for UCS-4, then Tkinter would not work out of the box. Hence the weird dependency. Regards, Martin _

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > No, that's not true. Python lets you choose UCS-4 or UCS-2. What the > default is depends on your platform. The truth is more complicated. If your Tcl is built for UCS-4, then Python will also be built for UCS-4 (unless overridden by command line). Otherwise, Python will

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > Because the encoding of that buffer appears to be different depending on > the configure options. What makes it appear so? sizeof(Py_UNICODE) changes when you change the option - does that, in your mind, mean that the encoding changes? > If that isn't true, then someone n

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > If this is the case, then we're clearly misleading users. If the > configure script says UCS-2, then as a user I would assume that > surrogate pairs would *not* be encoded, because I chose UCS-2, and it > doesn't support that. What do you mean by that? That the interprete

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Shane Hathaway wrote: > Then something in the Python docs ought to say why UCS-2 is not what you > want. I still don't know; I've heard differing opinions on the subject. > Some say you'll never need more than what UCS-2 provides. Is that > incorrect? That clearly depends on who "you" is. > Mo

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Bob Ippolito
On May 6, 2005, at 7:05 PM, Shane Hathaway wrote: > Nicholas Bastin wrote: > >> On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: >> >>> Wait... are you saying a Py_UNICODE array contains either UTF-16 or >>> UTF-32 characters, but never UCS-2? That's a big surprise to >>> me. I may >>> need t

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > I'm not sure the Python documentation is the place to teach someone > about unicode. The ISO 10646 pretty clearly defines UCS-2 as only > containing characters in the BMP (plane zero). On the other hand, I > don't know why python lets you choose UCS-2 anyhow, since it's a

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Martin v. Löwis
Nicholas Bastin wrote: > The important piece of information is that it is not guaranteed to be a > particular one of those sizes. Once you can't guarantee the size, no > one really cares what size it is. Please trust many years of experience: This is just not true. People do care, and they want t

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Shane Hathaway
Nicholas Bastin wrote: > > On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: >> Wait... are you saying a Py_UNICODE array contains either UTF-16 or >> UTF-32 characters, but never UCS-2? That's a big surprise to me. I may >> need to change my PyXPCOM patch to fit this new understanding. I tried

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 5:21 PM, Shane Hathaway wrote: > Nicholas Bastin wrote: >> On May 6, 2005, at 3:42 PM, James Y Knight wrote: >>> It means all the string operations treat strings as if they were >>> UCS-2, but that in actuality, they are UTF-16. Same as the case in >>> the >>> windows APIs and

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Shane Hathaway
Nicholas Bastin wrote: > On May 6, 2005, at 3:42 PM, James Y Knight wrote: >>It means all the string operations treat strings as if they were >>UCS-2, but that in actuality, they are UTF-16. Same as the case in the >>windows APIs and Java. That is, all string operations are essentially >>broken,

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 3:42 PM, James Y Knight wrote: > On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote: >> If this is the case, then we're clearly misleading users. If the >> configure script says UCS-2, then as a user I would assume that >> surrogate pairs would *not* be encoded, because I chose

Re: [Python-Dev] New Py_UNICODE doc (Another Attempt)

2005-05-06 Thread Nicholas Bastin
After reading through the code and the comments in this thread, I propose the following in the documentation as the definition of Py_UNICODE: "This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote: > On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > > >>You've got that wrong: Python let's you choose UCS-4 - >>UCS-2 is the default. > > > No, that's not true. Python lets you choose UCS-4 or UCS-2. What the > default is depends on your platform. If you run raw con

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread James Y Knight
On May 6, 2005, at 2:49 PM, Nicholas Bastin wrote: > If this is the case, then we're clearly misleading users. If the > configure script says UCS-2, then as a user I would assume that > surrogate pairs would *not* be encoded, because I chose UCS-2, and it > doesn't support that. I would assume th

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > You've got that wrong: Python let's you choose UCS-4 - > UCS-2 is the default. No, that's not true. Python lets you choose UCS-4 or UCS-2. What the default is depends on your platform. If you run raw configure, some systems will choose UCS-

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 3:25 AM, M.-A. Lemburg wrote: > I don't see why you shouldn't use Py_UNICODE buffer directly. > After all, the reason why we have that typedef is to make it > possible to program against an abstract type - regardless of > its size on the given platform. Because the encoding of

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread Nicholas Bastin
On May 6, 2005, at 3:17 AM, M.-A. Lemburg wrote: > You've got that wrong: Python let's you choose UCS-4 - > UCS-2 is the default. > > Note that Python's Unicode codecs UTF-8 and UTF-16 > are surrogate aware and thus support non-BMP code points > regardless of the build type: A UCS2-build of Pytho

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote: > On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote: > > >>Nicholas Bastin wrote: >> >>>"This type represents the storage type which is used by Python >>>internally as the basis for holding Unicode ordinals. Extension >>>module >>>developers should make no assumptions abo

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Fredrik Lundh wrote: > Thomas Heller wrote: > > >>AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars, >>independend from the size of wchar_t. The HAVE_USABLE_WCHAR_T macro >>can be used by extension writers to determine if Py_UNICODE is the same as >>wchar_t. > > > note th

Re: [Python-Dev] New Py_UNICODE doc

2005-05-06 Thread M.-A. Lemburg
Nicholas Bastin wrote: > On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: > >>>Nicholas Bastin wrote: >>> >>> "This type represents the storage type which is used by Python internally as the basis for holding Unicode ordinals. Extension module developers should make no assumptio

Re: [Python-Dev] New Py_UNICODE doc

2005-05-05 Thread Shane Hathaway
Nicholas Bastin wrote: > > On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: >> On a related note, it would be help if the documentation provided a >> little more background on unicode encoding. Specifically, that UCS-2 is >> not the same as UTF-16, even though they're both two bytes wide and mos

Re: [Python-Dev] New Py_UNICODE doc

2005-05-05 Thread Nicholas Bastin
On May 4, 2005, at 6:20 PM, Shane Hathaway wrote: > Martin v. Löwis wrote: >> Nicholas Bastin wrote: >> >>> "This type represents the storage type which is used by Python >>> internally as the basis for holding Unicode ordinals. Extension >>> module >>> developers should make no assumptions abo

Re: [Python-Dev] New Py_UNICODE doc

2005-05-05 Thread Nicholas Bastin
On May 4, 2005, at 6:03 PM, Martin v. Löwis wrote: > Nicholas Bastin wrote: >> "This type represents the storage type which is used by Python >> internally as the basis for holding Unicode ordinals. Extension >> module >> developers should make no assumptions about the size of this type on >> a

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Shane Hathaway
Martin v. Löwis wrote: > Nicholas Bastin wrote: > >>"This type represents the storage type which is used by Python >>internally as the basis for holding Unicode ordinals. Extension module >>developers should make no assumptions about the size of this type on >>any given platform." > > > But

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Martin v. Löwis
Nicholas Bastin wrote: > "This type represents the storage type which is used by Python > internally as the basis for holding Unicode ordinals. Extension module > developers should make no assumptions about the size of this type on > any given platform." But people want to know "Is Python's Un

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Thomas Heller
"Fredrik Lundh" <[EMAIL PROTECTED]> writes: > Thomas Heller wrote: > >> AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars, >> independend from the size of wchar_t. The HAVE_USABLE_WCHAR_T macro >> can be used by extension writers to determine if Py_UNICODE is the same as >>

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Fredrik Lundh
Thomas Heller wrote: > AFAIK, you can configure Python to use 16-bits or 32-bits Unicode chars, > independend from the size of wchar_t. The HAVE_USABLE_WCHAR_T macro > can be used by extension writers to determine if Py_UNICODE is the same as > wchar_t. note that "usable" is more than just "same

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Nicholas Bastin
On May 4, 2005, at 1:02 PM, Michael Hudson wrote: > Nicholas Bastin <[EMAIL PROTECTED]> writes: > >> The current documentation for Py_UNICODE states: >> >> "This type represents a 16-bit unsigned storage type which is used by >> Python internally as basis for holding Unicode ordinals. On platfor

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Michael Hudson
Nicholas Bastin <[EMAIL PROTECTED]> writes: > The current documentation for Py_UNICODE states: > > "This type represents a 16-bit unsigned storage type which is used by > Python internally as basis for holding Unicode ordinals. On platforms > where wchar_t is available and also has 16-bits, P

Re: [Python-Dev] New Py_UNICODE doc

2005-05-04 Thread Thomas Heller
Nicholas Bastin <[EMAIL PROTECTED]> writes: > The current documentation for Py_UNICODE states: > > "This type represents a 16-bit unsigned storage type which is used by > Python internally as basis for holding Unicode ordinals. On platforms > where wchar_t is available and also has 16-bits, P