Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-31 Thread François Pinard
[Greg Ewing] >> All development is done in house by French people. All documentation, >> external or internal, comments, identifier and function names, >> everything is in French. > There's nothing stopping you from creating your own Frenchified > version of Python that lets you use all the c

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-31 Thread Greg Ewing
François Pinard wrote: > All development is done in house by French people. All documentation, > external or internal, comments, identifier and function names, > everything is in French. There's nothing stopping you from creating your own Frenchified version of Python that lets you use all the

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-31 Thread Steve Holden
Adam Olsen wrote: > On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote: > >>All development is done in house by French people. All documentation, >>external or internal, comments, identifier and function names, >>everything is in French. Some of the developers here have had a long >>programm

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-30 Thread Adam Olsen
On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote: > All development is done in house by French people. All documentation, > external or internal, comments, identifier and function names, > everything is in French. Some of the developers here have had a long > programming life, while they on

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-30 Thread François Pinard
[Martin von Löwis] > My canonical example is François Pinard, who keeps requesting it, > saying that local people where surprised they couldn't use accented > characters in Python. Perhaps that's because he actually is Quebecian > :-) I presume I should comment a bit on this. People here are

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-26 Thread Bengt Richter
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: >Bengt Richter wrote: >> Please bear with me for a few paragraphs ;-) > >Please note that source code encoding doesn't really have >anything to do with the way the interpreter executes the >program - it's merely a way to tell the parser how to >conver

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Neil Hodgson
M.-A. Lemburg: > You mean a slice that slices out the next ? Yes. > This sounds a lot like you'd want iterators for the various > index types. Should be possible to implement on top of the > proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc. Iterators may be helpful, but can a

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Guido van Rossum wrote: > Yes but why? What does this invariant do for him? I don't know about this person, but there are a few things that don't work properly in UTF-16 mode: - the Unicode character database fails to lookup things. u"\U0001D670".isupper() gives false, but should give true

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Guido van Rossum
On 10/25/05, Bill Janssen <[EMAIL PROTECTED]> wrote: > I think he was more interested in the invariant Martin proposed, that > > len("\U0001") > > should always be the same and should always be 1. Yes but why? What does this invariant do for him? -- --Guido van Rossum (home page: http://www.

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Bill Janssen
I think he was more interested in the invariant Martin proposed, that len("\U0001") should always be the same and should always be 1. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubsc

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread Martin v. Löwis
Bill Janssen wrote: > I just got mail this morning from a researcher who wants exactly what > Martin described, and wondered why the default MacPython 2.4.2 didn't > provide it by default. :-) If all he wants is to represent Deseret, he can do so in a 16-bit Unicode type, too: Python supports UTF-

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Bengt Richter wrote: > At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: > >>Bengt Richter wrote: >> >>>Please bear with me for a few paragraphs ;-) >> >>Please note that source code encoding doesn't really have >>anything to do with the way the interpreter executes the >>program - it's merely a way

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Neil Hodgson wrote: > M.-A. Lemburg: > > >>Unicode has the concept of combining code points, e.g. you can >>store an "é" (e with a accent) as "e" + "'". Now if you slice >>off the accent, you'll break the character that you encoded >>using combining code points. >>... >>next_(u, index) -> int

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
Guido writes: > Oh, I don't doubt that they want it. But often they don't *need* it, > and the higher-level goal they are trying to accomplish can be dealt > with better in a different way. (Sort of my response to people asking > for static typing in Python as well. :-) I suppose that's true. But

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing
Guido van Rossum wrote: > Python's slice-and-dice model pretty much ensures that indexing is > common. Almost everything is ultimately represented as indices: regex > search results have the index in the API, find()/index() return > indices, many operations take a start and/or end index. Maybe th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Greg Ewing
Guido van Rossum wrote: > I think the API should reflect the representation *to some extend*, > namely it shouldn't claim to have operations that are typically > thought of as O(1) that can only be implemented as O(n). Maybe a compromise could be reached by using a btree of chunks or something, s

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum
On 10/24/05, Bill Janssen <[EMAIL PROTECTED]> wrote: > > > - yet others think: "I want all of Unicode, with proper, efficient > > >indexing, so I want four bytes per char". > > > > I doubt the last one though. Probably they really don't want efficient > > indexing, they want to perform higher-l

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
> > - yet others think: "I want all of Unicode, with proper, efficient > >indexing, so I want four bytes per char". > > I doubt the last one though. Probably they really don't want efficient > indexing, they want to perform higher-level operations that currently > are only possible using effic

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson
M.-A. Lemburg: > Unicode has the concept of combining code points, e.g. you can > store an "é" (e with a accent) as "e" + "'". Now if you slice > off the accent, you'll break the character that you encoded > using combining code points. > ... > next_(u, index) -> integer > > Returns th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
Antoine Pitrou wrote: >>There are many design alternatives: > > Wouldn't it be simpler to use: > - one-byte representation if every character <= 0xFF > - two-byte representation if every character <= 0x > - four-byte representation otherwise As I said: there are many alternatives. This one ha

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > Changing the APIs would be much work, although perhaps not impossible > > of Python 3000. For example, Raymond Hettinger's partition() API > > doesn't refer to indices at all, and can replace many uses of find()

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
Guido van Rossum wrote: > Changing the APIs would be much work, although perhaps not impossible > of Python 3000. For example, Raymond Hettinger's partition() API > doesn't refer to indices at all, and can replace many uses of find() > or index(). I think Neil's proposal is not to make them go awa

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Guido van Rossum
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Indeed. My guess is that indexing is more common than you think, > especially when iterating over the string. Of course, iteration > could also operate on UTF-8, if you introduced string iterator > objects. Python's slice-and-dice model p

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Antoine Pitrou
> There are many design alternatives: one option would be to support > *three* internal representations in a single type, generating the > others from the one operation existing as needed. The default, initial > representation might be UTF-8, with UCS-4 only being generated when > indexing occurs,

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
M.-A. Lemburg wrote: > There seems to be a general misunderstanding here: even if you > have UCS4 storage, it is still possible to slice a Unicode > string in a way which makes rendering it correctly. [impossible?] > Unicode has the concept of

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Martin v. Löwis
Neil Hodgson wrote: >For Windows, the code will get a little uglier, needing to perform > an allocation/encoding and deallocation more often then at present but > I don't think there will be a speed degradation as Windows is > currently performing a conversion from 8 bit to UTF-16 inside many >

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
> Python should allow strings to > contain any Unicode character and should be indexable yielding > characters rather than half characters. Therefore Python strings > should appear to be UTF-32. +1. Bill ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Bill Janssen
> >I'm thinking about making all character strings Unicode (possibly with > >different internal representations a la NSString in Apple's Objective > >C) and introduce a separate mutable bytes array data type. But I could > >use some validation or feedback on this idea from actual > >practitioners.

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg
Bengt Richter wrote: > Please bear with me for a few paragraphs ;-) Please note that source code encoding doesn't really have anything to do with the way the interpreter executes the program - it's merely a way to tell the parser how to convert string literals (currently on the Unicode ones) into

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread M.-A. Lemburg
Neil Hodgson wrote: > Guido van Rossum: > > >>Folks, please focus on what Python 3000 should do. >> >>I'm thinking about making all character strings Unicode (possibly with >>different internal representations a la NSString in Apple's Objective >>C) and introduce a separate mutable bytes array da

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-24 Thread Neil Hodgson
Martin v. Löwis: > That's very tricky. If you have multiple implementations, you make > usage at the C API difficult. If you make it either UTF-8 or UTF-32, > you make PythonWin difficult. If you make it UTF-16, you make indexing > difficult. For Windows, the code will get a little uglier, nee

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Martin v. Löwis
Phillip J. Eby wrote: > I'm tempted to say it would be even better if there was a command line > option that could be used to force all binary opens to result in bytes, and > require all text opens to specify an encoding. For Python 3000? -1. There shouldn't be command line switches that have th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Martin v. Löwis
Neil Hodgson wrote: >I'd like to more tightly define Unicode strings for Python 3000. > Currently, Unicode strings may be implemented with either 2 byte > (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to > contain any Unicode character and should be indexable yielding > chara

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Neil Hodgson
Guido van Rossum: > Folks, please focus on what Python 3000 should do. > > I'm thinking about making all character strings Unicode (possibly with > different internal representations a la NSString in Apple's Objective > C) and introduce a separate mutable bytes array data type. But I could > use s

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Phillip J. Eby
At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote: >Folks, please focus on what Python 3000 should do. > >I'm thinking about making all character strings Unicode (possibly with >different internal representations a la NSString in Apple's Objective >C) and introduce a separate mutable bytes array

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Bob Ippolito
On Oct 23, 2005, at 6:06 PM, Guido van Rossum wrote: > Folks, please focus on what Python 3000 should do. > > I'm thinking about making all character strings Unicode (possibly with > different internal representations a la NSString in Apple's Objective > C) and introduce a separate mutable bytes a

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Guido van Rossum
Folks, please focus on what Python 3000 should do. I'm thinking about making all character strings Unicode (possibly with different internal representations a la NSString in Apple's Objective C) and introduce a separate mutable bytes array data type. But I could use some validation or feedback on

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Stephan Richter
On Sunday 23 October 2005 18:10, Jason Orendorff wrote: > -1 on keeping the source encoding of string literals.  Python should > definitely decode them at compile time. > > -1 on decoding implicitly "as needed".  This causes decoding to happen > late, in unpredictable places.  Decodes can fail; the

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Bob Ippolito
On Oct 23, 2005, at 3:10 PM, Jason Orendorff wrote: > -1 on decoding implicitly "as needed". This causes decoding to happen > late, in unpredictable places. Decodes can fail; they should happen > as early and as close to the data source as possible. That's not necessarily true... Some codecs c

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-23 Thread Jason Orendorff
-1 on keeping the source encoding of string literals. Python should definitely decode them at compile time. -1 on decoding implicitly "as needed". This causes decoding to happen late, in unpredictable places. Decodes can fail; they should happen as early and as close to the data source as possi

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-16 Thread Martin v. Löwis
Martin Blais wrote: >>Yes. setdefaultencoding() is removed from sys by site.py. To get it >>again you must reload sys. > > > Thanks. Actually, I should take the opportunity to advise people that setdefaultencoding doesn't really work. With the default default encoding, strings and Unicode object

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-16 Thread Martin Blais
On 10/15/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote: > Martin Blais wrote: > > On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: > >> Martin Blais <[EMAIL PROTECTED]> writes: > >> > >> > How hard would that be to implement? > >> > >> import sys > >> reload(sys) > >> sys.setdefaultencodin

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-15 Thread Reinhold Birkenfeld
Martin Blais wrote: > On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: >> Martin Blais <[EMAIL PROTECTED]> writes: >> >> > How hard would that be to implement? >> >> import sys >> reload(sys) >> sys.setdefaultencoding('undefined') > > Hmmm any particular reason for the call to reload() here?

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-14 Thread Martin Blais
On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote: > Martin Blais <[EMAIL PROTECTED]> writes: > > > How hard would that be to implement? > > import sys > reload(sys) > sys.setdefaultencoding('undefined') Hmmm any particular reason for the call to reload() here? _

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Josiah Carlson wrote: > > > and isn't pure ASCII. > > > > How can you be sure that something that is /semantically textual/ will > > always remain "pure ASCII" ? That's contradictory, unless your software > > never goes out of the anglo-saxon world (and even...). > > Non-unicode text input widgets

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Josiah Carlson
Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > > Antoine Pitrou wrote: > > > > > A good rule of thumb is to convert to unicode everything that is > > > semantically textual > > > > and isn't pure ASCII. > > How can you be sure th

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Jim Fulton wrote: > I would argue that it's evil to change the default encoding > in the first place, except in this case to disable implicit > encoding or decoding. absolutely. unfortunately, all attempts to add such information to the sys module documentation seem to have failed... (last time

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
M.-A. Lemburg wrote: > Michael Hudson wrote: > >>Martin Blais <[EMAIL PROTECTED]> writes: >> >> >> >>>What if we could completely disable the implicit conversions between >>>unicode and str? In other words, if you would ALWAYS be forced to >>>call either .encode() or .decode() to convert between

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Jim Fulton
Martin Blais wrote: > Hi. > > Like a lot of people (or so I hear in the blogosphere...), I've been > experiencing some friction in my code with unicode conversion > problems. Even when being super extra careful with the types of str's > or unicode objects that my variables can contain, there is a

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Martin Blais
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > > > > I'm not sure it's a sensible default. > > Me neither, especially since this would make it impossible > to write polymorphic code - e.g. ', '.join(list) wouldn't > work anymore if list contains Unicode; dito for u', '.join(list) > with lis

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit : > Antoine Pitrou wrote: > > > A good rule of thumb is to convert to unicode everything that is > > semantically textual > > and isn't pure ASCII. How can you be sure that something that is /semantically textual/ will always remain

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Fredrik Lundh
Antoine Pitrou wrote: > A good rule of thumb is to convert to unicode everything that is > semantically textual and isn't pure ASCII. (anyone who are tempted to argue otherwise should benchmark their applications, both speed- and memorywise, and be prepared to come up with very strong arguments

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Antoine Pitrou
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit : > > What if we could completely disable the implicit conversions between > unicode and str? This would be very annoying when dealing with some modules or libraries where the type (str / unicode) returned by a function depends on the c

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread M.-A. Lemburg
Michael Hudson wrote: > Martin Blais <[EMAIL PROTECTED]> writes: > > >>What if we could completely disable the implicit conversions between >>unicode and str? In other words, if you would ALWAYS be forced to >>call either .encode() or .decode() to convert between one and the >>other... wouldn't

Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-03 Thread Michael Hudson
Martin Blais <[EMAIL PROTECTED]> writes: > What if we could completely disable the implicit conversions between > unicode and str? In other words, if you would ALWAYS be forced to > call either .encode() or .decode() to convert between one and the > other... wouldn't that help a lot deal with tha