[Greg Ewing]
>> All development is done in house by French people. All documentation,
>> external or internal, comments, identifier and function names,
>> everything is in French.
> There's nothing stopping you from creating your own Frenchified
> version of Python that lets you use all the c
François Pinard wrote:
> All development is done in house by French people. All documentation,
> external or internal, comments, identifier and function names,
> everything is in French.
There's nothing stopping you from creating your own
Frenchified version of Python that lets you use all
the
Adam Olsen wrote:
> On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote:
>
>>All development is done in house by French people. All documentation,
>>external or internal, comments, identifier and function names,
>>everything is in French. Some of the developers here have had a long
>>programm
On 10/30/05, François Pinard <[EMAIL PROTECTED]> wrote:
> All development is done in house by French people. All documentation,
> external or internal, comments, identifier and function names,
> everything is in French. Some of the developers here have had a long
> programming life, while they on
[Martin von Löwis]
> My canonical example is François Pinard, who keeps requesting it,
> saying that local people where surprised they couldn't use accented
> characters in Python. Perhaps that's because he actually is Quebecian
> :-)
I presume I should comment a bit on this.
People here are
At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
>Bengt Richter wrote:
>> Please bear with me for a few paragraphs ;-)
>
>Please note that source code encoding doesn't really have
>anything to do with the way the interpreter executes the
>program - it's merely a way to tell the parser how to
>conver
M.-A. Lemburg:
> You mean a slice that slices out the next ?
Yes.
> This sounds a lot like you'd want iterators for the various
> index types. Should be possible to implement on top of the
> proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.
Iterators may be helpful, but can a
Guido van Rossum wrote:
> Yes but why? What does this invariant do for him?
I don't know about this person, but there are a few things that
don't work properly in UTF-16 mode:
- the Unicode character database fails to lookup things.
u"\U0001D670".isupper() gives false, but should give true
On 10/25/05, Bill Janssen <[EMAIL PROTECTED]> wrote:
> I think he was more interested in the invariant Martin proposed, that
>
> len("\U0001")
>
> should always be the same and should always be 1.
Yes but why? What does this invariant do for him?
--
--Guido van Rossum (home page: http://www.
I think he was more interested in the invariant Martin proposed, that
len("\U0001")
should always be the same and should always be 1.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubsc
Bill Janssen wrote:
> I just got mail this morning from a researcher who wants exactly what
> Martin described, and wondered why the default MacPython 2.4.2 didn't
> provide it by default. :-)
If all he wants is to represent Deseret, he can do so in a 16-bit
Unicode type, too: Python supports UTF-
Bengt Richter wrote:
> At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote:
>
>>Bengt Richter wrote:
>>
>>>Please bear with me for a few paragraphs ;-)
>>
>>Please note that source code encoding doesn't really have
>>anything to do with the way the interpreter executes the
>>program - it's merely a way
Neil Hodgson wrote:
> M.-A. Lemburg:
>
>
>>Unicode has the concept of combining code points, e.g. you can
>>store an "é" (e with a accent) as "e" + "'". Now if you slice
>>off the accent, you'll break the character that you encoded
>>using combining code points.
>>...
>>next_(u, index) -> int
Guido writes:
> Oh, I don't doubt that they want it. But often they don't *need* it,
> and the higher-level goal they are trying to accomplish can be dealt
> with better in a different way. (Sort of my response to people asking
> for static typing in Python as well. :-)
I suppose that's true. But
Guido van Rossum wrote:
> Python's slice-and-dice model pretty much ensures that indexing is
> common. Almost everything is ultimately represented as indices: regex
> search results have the index in the API, find()/index() return
> indices, many operations take a start and/or end index.
Maybe th
Guido van Rossum wrote:
> I think the API should reflect the representation *to some extend*,
> namely it shouldn't claim to have operations that are typically
> thought of as O(1) that can only be implemented as O(n).
Maybe a compromise could be reached by using a
btree of chunks or something, s
On 10/24/05, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > > - yet others think: "I want all of Unicode, with proper, efficient
> > >indexing, so I want four bytes per char".
> >
> > I doubt the last one though. Probably they really don't want efficient
> > indexing, they want to perform higher-l
> > - yet others think: "I want all of Unicode, with proper, efficient
> >indexing, so I want four bytes per char".
>
> I doubt the last one though. Probably they really don't want efficient
> indexing, they want to perform higher-level operations that currently
> are only possible using effic
M.-A. Lemburg:
> Unicode has the concept of combining code points, e.g. you can
> store an "é" (e with a accent) as "e" + "'". Now if you slice
> off the accent, you'll break the character that you encoded
> using combining code points.
> ...
> next_(u, index) -> integer
>
> Returns th
Antoine Pitrou wrote:
>>There are many design alternatives:
>
> Wouldn't it be simpler to use:
> - one-byte representation if every character <= 0xFF
> - two-byte representation if every character <= 0x
> - four-byte representation otherwise
As I said: there are many alternatives. This one ha
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > Changing the APIs would be much work, although perhaps not impossible
> > of Python 3000. For example, Raymond Hettinger's partition() API
> > doesn't refer to indices at all, and can replace many uses of find()
Guido van Rossum wrote:
> Changing the APIs would be much work, although perhaps not impossible
> of Python 3000. For example, Raymond Hettinger's partition() API
> doesn't refer to indices at all, and can replace many uses of find()
> or index().
I think Neil's proposal is not to make them go awa
On 10/24/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Indeed. My guess is that indexing is more common than you think,
> especially when iterating over the string. Of course, iteration
> could also operate on UTF-8, if you introduced string iterator
> objects.
Python's slice-and-dice model p
> There are many design alternatives: one option would be to support
> *three* internal representations in a single type, generating the
> others from the one operation existing as needed. The default, initial
> representation might be UTF-8, with UCS-4 only being generated when
> indexing occurs,
M.-A. Lemburg wrote:
> There seems to be a general misunderstanding here: even if you
> have UCS4 storage, it is still possible to slice a Unicode
> string in a way which makes rendering it correctly.
[impossible?]
> Unicode has the concept of
Neil Hodgson wrote:
>For Windows, the code will get a little uglier, needing to perform
> an allocation/encoding and deallocation more often then at present but
> I don't think there will be a speed degradation as Windows is
> currently performing a conversion from 8 bit to UTF-16 inside many
>
> Python should allow strings to
> contain any Unicode character and should be indexable yielding
> characters rather than half characters. Therefore Python strings
> should appear to be UTF-32.
+1.
Bill
___
Python-Dev mailing list
Python-Dev@python.org
> >I'm thinking about making all character strings Unicode (possibly with
> >different internal representations a la NSString in Apple's Objective
> >C) and introduce a separate mutable bytes array data type. But I could
> >use some validation or feedback on this idea from actual
> >practitioners.
Bengt Richter wrote:
> Please bear with me for a few paragraphs ;-)
Please note that source code encoding doesn't really have
anything to do with the way the interpreter executes the
program - it's merely a way to tell the parser how to
convert string literals (currently on the Unicode ones)
into
Neil Hodgson wrote:
> Guido van Rossum:
>
>
>>Folks, please focus on what Python 3000 should do.
>>
>>I'm thinking about making all character strings Unicode (possibly with
>>different internal representations a la NSString in Apple's Objective
>>C) and introduce a separate mutable bytes array da
Martin v. Löwis:
> That's very tricky. If you have multiple implementations, you make
> usage at the C API difficult. If you make it either UTF-8 or UTF-32,
> you make PythonWin difficult. If you make it UTF-16, you make indexing
> difficult.
For Windows, the code will get a little uglier, nee
Phillip J. Eby wrote:
> I'm tempted to say it would be even better if there was a command line
> option that could be used to force all binary opens to result in bytes, and
> require all text opens to specify an encoding.
For Python 3000? -1. There shouldn't be command line switches that have
th
Neil Hodgson wrote:
>I'd like to more tightly define Unicode strings for Python 3000.
> Currently, Unicode strings may be implemented with either 2 byte
> (UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
> contain any Unicode character and should be indexable yielding
> chara
Guido van Rossum:
> Folks, please focus on what Python 3000 should do.
>
> I'm thinking about making all character strings Unicode (possibly with
> different internal representations a la NSString in Apple's Objective
> C) and introduce a separate mutable bytes array data type. But I could
> use s
At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote:
>Folks, please focus on what Python 3000 should do.
>
>I'm thinking about making all character strings Unicode (possibly with
>different internal representations a la NSString in Apple's Objective
>C) and introduce a separate mutable bytes array
On Oct 23, 2005, at 6:06 PM, Guido van Rossum wrote:
> Folks, please focus on what Python 3000 should do.
>
> I'm thinking about making all character strings Unicode (possibly with
> different internal representations a la NSString in Apple's Objective
> C) and introduce a separate mutable bytes a
Folks, please focus on what Python 3000 should do.
I'm thinking about making all character strings Unicode (possibly with
different internal representations a la NSString in Apple's Objective
C) and introduce a separate mutable bytes array data type. But I could
use some validation or feedback on
On Sunday 23 October 2005 18:10, Jason Orendorff wrote:
> -1 on keeping the source encoding of string literals. Python should
> definitely decode them at compile time.
>
> -1 on decoding implicitly "as needed". This causes decoding to happen
> late, in unpredictable places. Decodes can fail; the
On Oct 23, 2005, at 3:10 PM, Jason Orendorff wrote:
> -1 on decoding implicitly "as needed". This causes decoding to happen
> late, in unpredictable places. Decodes can fail; they should happen
> as early and as close to the data source as possible.
That's not necessarily true... Some codecs c
-1 on keeping the source encoding of string literals. Python should
definitely decode them at compile time.
-1 on decoding implicitly "as needed". This causes decoding to happen
late, in unpredictable places. Decodes can fail; they should happen
as early and as close to the data source as possi
Martin Blais wrote:
>>Yes. setdefaultencoding() is removed from sys by site.py. To get it
>>again you must reload sys.
>
>
> Thanks.
Actually, I should take the opportunity to advise people that
setdefaultencoding doesn't really work. With the default default
encoding, strings and Unicode object
On 10/15/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote:
> Martin Blais wrote:
> > On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
> >> Martin Blais <[EMAIL PROTECTED]> writes:
> >>
> >> > How hard would that be to implement?
> >>
> >> import sys
> >> reload(sys)
> >> sys.setdefaultencodin
Martin Blais wrote:
> On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
>> Martin Blais <[EMAIL PROTECTED]> writes:
>>
>> > How hard would that be to implement?
>>
>> import sys
>> reload(sys)
>> sys.setdefaultencoding('undefined')
>
> Hmmm any particular reason for the call to reload() here?
On 10/3/05, Michael Hudson <[EMAIL PROTECTED]> wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
>
> > How hard would that be to implement?
>
> import sys
> reload(sys)
> sys.setdefaultencoding('undefined')
Hmmm any particular reason for the call to reload() here?
_
Josiah Carlson wrote:
> > > and isn't pure ASCII.
> >
> > How can you be sure that something that is /semantically textual/ will
> > always remain "pure ASCII" ? That's contradictory, unless your software
> > never goes out of the anglo-saxon world (and even...).
>
> Non-unicode text input widgets
Antoine Pitrou <[EMAIL PROTECTED]> wrote:
>
> Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> > Antoine Pitrou wrote:
> >
> > > A good rule of thumb is to convert to unicode everything that is
> > > semantically textual
> >
> > and isn't pure ASCII.
>
> How can you be sure th
Jim Fulton wrote:
> I would argue that it's evil to change the default encoding
> in the first place, except in this case to disable implicit
> encoding or decoding.
absolutely. unfortunately, all attempts to add such information to the
sys module documentation seem to have failed...
(last time
M.-A. Lemburg wrote:
> Michael Hudson wrote:
>
>>Martin Blais <[EMAIL PROTECTED]> writes:
>>
>>
>>
>>>What if we could completely disable the implicit conversions between
>>>unicode and str? In other words, if you would ALWAYS be forced to
>>>call either .encode() or .decode() to convert between
Martin Blais wrote:
> Hi.
>
> Like a lot of people (or so I hear in the blogosphere...), I've been
> experiencing some friction in my code with unicode conversion
> problems. Even when being super extra careful with the types of str's
> or unicode objects that my variables can contain, there is a
On 10/3/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> >
> > I'm not sure it's a sensible default.
>
> Me neither, especially since this would make it impossible
> to write polymorphic code - e.g. ', '.join(list) wouldn't
> work anymore if list contains Unicode; dito for u', '.join(list)
> with lis
Le lundi 03 octobre 2005 à 14:59 +0200, Fredrik Lundh a écrit :
> Antoine Pitrou wrote:
>
> > A good rule of thumb is to convert to unicode everything that is
> > semantically textual
>
> and isn't pure ASCII.
How can you be sure that something that is /semantically textual/ will
always remain
Antoine Pitrou wrote:
> A good rule of thumb is to convert to unicode everything that is
> semantically textual
and isn't pure ASCII.
(anyone who are tempted to argue otherwise should benchmark their
applications, both speed- and memorywise, and be prepared to come
up with very strong arguments
Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :
>
> What if we could completely disable the implicit conversions between
> unicode and str?
This would be very annoying when dealing with some modules or libraries
where the type (str / unicode) returned by a function depends on the
c
Michael Hudson wrote:
> Martin Blais <[EMAIL PROTECTED]> writes:
>
>
>>What if we could completely disable the implicit conversions between
>>unicode and str? In other words, if you would ALWAYS be forced to
>>call either .encode() or .decode() to convert between one and the
>>other... wouldn't
Martin Blais <[EMAIL PROTECTED]> writes:
> What if we could completely disable the implicit conversions between
> unicode and str? In other words, if you would ALWAYS be forced to
> call either .encode() or .decode() to convert between one and the
> other... wouldn't that help a lot deal with tha
55 matches
Mail list logo