Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/31/2011 5:58 PM, Neil Hodgson wrote: Glenn Linderman: That said, regexp, or some sort of cursor on a string, might be a workable solution. Will it have adequate performance? Perhaps, at least for some applications. Will it be as conceptually simple as indexing an array of graphemes? No

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Guido van Rossum
On Wed, Aug 31, 2011 at 6:29 PM, Neil Hodgson wrote: > Guido van Rossum: > >> On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: >>> [...] some text drawing engines draw decomposed characters ("o" >>> followed by " ̈" -> "ö") differently compared to their composite >>> equivalents ("ö") and th

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Neil Hodgson
Guido van Rossum: > On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: >> [...] some text drawing engines draw decomposed characters ("o" >> followed by " ̈" -> "ö") differently compared to their composite >> equivalents ("ö") and this may be perceived as better or worse. I'd >> like to offer

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Hagen Fürstenau
>> [...] some text drawing engines draw decomposed characters ("o" >> followed by " ̈" -> "ö") differently compared to their composite >> equivalents ("ö") and this may be perceived as better or worse. I'd >> like to offer an option to replace some decomposed characters with >> their composite equ

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Guido van Rossum
On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: > [...] some text drawing engines draw decomposed characters ("o" > followed by " ̈" -> "ö") differently compared to their composite > equivalents ("ö") and this may be perceived as better or worse. I'd > like to offer an option to replace some

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Neil Hodgson
Glenn Linderman: > That said, regexp, or some sort of cursor on a string, might be a workable > solution.  Will it have adequate performance?  Perhaps, at least for some > applications.  Will it be as conceptually simple as indexing an array of > graphemes?  No.  Will it ever reach the efficiency

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Nick Coghlan
On Thu, Sep 1, 2011 at 3:28 AM, Guido van Rossum wrote: > On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro > Cesare, I'm really sorry that you became so disillusioned that you > abandoned wordcode. I agree that we were too optimistic about Unladen > Swallow. Also that the existence of PyPy and it

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Nick Coghlan
On Thu, Sep 1, 2011 at 8:02 AM, Terry Reedy wrote: > On 8/31/2011 1:10 PM, Guido van Rossum wrote: >> Ok, I dig this, to some extent. However saying it is UCS-2 is equally >> bad. > > As I said on the tracker, our narrow builds are in-between (while moving > closer to UTF-16), and both terms are d

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Terry Reedy
On 8/31/2011 1:10 PM, Guido van Rossum wrote: This is why I find the issue of Python, the language (and stdlib), as a whole "conforming to the Unicode standard" such a troublesome concept -- I think it is something that an application may claim, but the language should make much more modest clai

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Cesare Di Mauro
2011/8/31 Guido van Rossum > On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro > wrote: > > It isn't, because motivation to do something new with CPython vanishes, > at > > least on some areas (virtual machine / ceval.c), even having some ideas > to > > experiment with. That's why in my last tal

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Cesare Di Mauro
2011/8/31 stefan brunthaler > > I think that you must deal with big endianess because some RISC can't > handle > > at all data in little endian format. > > > > In WPython I have wrote some macros which handle both endianess, but > lacking > > big endian machines I never had the opportunity to ver

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/31/2011 10:10 AM, Guido van Rossum wrote: On Tue, Aug 30, 2011 at 11:03 PM, Stephen J. Turnbull wrote: [me] > That sounds like a contradiction -- it wouldn't be a UTF-16 array if > you couldn't tell that it was using UTF-16. Well, that's why I wrote "intended to be suggestive". Th

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/31/2011 5:21 AM, Stephen J. Turnbull wrote: Glenn Linderman writes: > From comments Guido has made, he is not interested in changing the > efficiency or access methods of the str type to raise the level of > support of Unicode to the composed character, or grapheme cluster > co

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/31/2011 10:20 AM, Guido van Rossum wrote: On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman wrote: The str type itself can presently be used to process other character encodings: if they are fixed width< 32-bit elements those encodings might be considered Unicode encodings, but there is no

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/31/2011 11:56 AM, Guido van Rossum wrote: On Wed, Aug 31, 2011 at 11:51 AM, Glenn Linderman mailto:v%2bpyt...@g.nevcal.com>> wrote: On 8/31/2011 10:12 AM, Guido van Rossum wrote: On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman wrote: So from

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Guido van Rossum
On Wed, Aug 31, 2011 at 11:51 AM, Glenn Linderman wrote: > On 8/31/2011 10:12 AM, Guido van Rossum wrote: > > On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman > wrote: > > So from reading all this discussion, I think this point is rather a key > one... and it has been made repeatedly in diffe

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/31/2011 10:12 AM, Guido van Rossum wrote: On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman wrote: So from reading all this discussion, I think this point is rather a key one... and it has been made repeatedly in different ways: Arrays are not suitable for manipulating Unicode character se

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Guido van Rossum
On Wed, Aug 31, 2011 at 10:08 AM, stefan brunthaler wrote: > Well, my code has primarily been a vehicle for my research in that > area and thus is not immediately suited to adoption [...]. But if you want to be taken seriously as a researcher, you should publish your code! Without publication of

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Guido van Rossum
On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro wrote: > It isn't, because motivation to do something new with CPython vanishes, at > least on some areas (virtual machine / ceval.c), even having some ideas to > experiment with. That's why in my last talk on EuroPython I decided to move > on othe

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Guido van Rossum
On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman wrote: > The str type itself can presently be used to process other > character encodings: if they are fixed width < 32-bit elements those > encodings might be considered Unicode encodings, but there is no requirement > that they are, and some opera

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Guido van Rossum
On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman wrote: > So from reading all this discussion, I think this point is rather a key > one... and it has been made repeatedly in different ways:  Arrays are not > suitable for manipulating Unicode character sequences, and the str type is > an array with

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Guido van Rossum
On Tue, Aug 30, 2011 at 11:03 PM, Stephen J. Turnbull wrote: [me] >  > That sounds like a contradiction -- it wouldn't be a UTF-16 array if >  > you couldn't tell that it was using UTF-16. > > Well, that's why I wrote "intended to be suggestive".  The Unicode > Standard does not specify at all wha

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread stefan brunthaler
> So, basically, you built a JIT compiler but don't want to call it that, > right? Just because it compiles byte code to other byte code rather than to > native CPU instructions does not mean it doesn't compile Just In Time. > For me, a definition of a JIT compiler or any dynamic compilation subsys

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread stefan brunthaler
> I think that you must deal with big endianess because some RISC can't handle > at all data in little endian format. > > In WPython I have wrote some macros which handle both endianess, but lacking > big endian machines I never had the opportunity to verify if something was > wrong. > I am sorry f

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Stephen J. Turnbull
Glenn Linderman writes: > From comments Guido has made, he is not interested in changing the > efficiency or access methods of the str type to raise the level of > support of Unicode to the composed character, or grapheme cluster > concepts. IMO, that would be a bad idea, as higher-level Unic

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-31 Thread Glenn Linderman
On 8/30/2011 11:03 PM, Stephen J. Turnbull wrote: Guido van Rossum writes: > On Tue, Aug 30, 2011 at 7:55 PM, Stephen J. Turnbull wrote: > > For starters, one that doesn't ever return lone surrogates, but rather > > interprets surrogate pairs as Unicode code points as in UTF-16. (T

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Stefan Behnel
stefan brunthaler, 30.08.2011 22:41: Ok, there there's something else you haven't told us. Are you saying that the original (old) bytecode is still used (and hence written to and read from .pyc files)? Short answer: yes. Long answer: I added an invocation counter to the code object and keep int