Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-30 Thread Serhiy Storchaka
On 30.05.12 14:26, Victor Stinner wrote: I implemented something like that, and it was not efficient and very complex. See for example the (incomplete) patch for str%args attached to the issue #14687: http://bugs.python.org/file25413/pyunicode_format-2.patch I have seen and commented on this p

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-30 Thread Victor Stinner
>> The "two steps" method is not promising: parsing the format string >> twice is slower than other methods. > > The "1.5 steps" method is more promising -- first parse the format string in > an efficient internal representation, and then allocate the output string > and then write characters (or e

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-30 Thread Serhiy Storchaka
On 30.05.12 01:44, Victor Stinner wrote: The "two steps" method is not promising: parsing the format string twice is slower than other methods. The "1.5 steps" method is more promising -- first parse the format string in an efficient internal representation, and then allocate the output strin

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-29 Thread Glenn Linderman
On 5/29/2012 3:51 PM, Nick Coghlan wrote: On Wed, May 30, 2012 at 8:44 AM, Victor Stinner wrote: I also compared str%args and str.format() with Python 2.7 (byte strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-29 Thread Nick Coghlan
On Wed, May 30, 2012 at 8:44 AM, Victor Stinner wrote: > I also compared str%args and str.format() with Python 2.7 (byte > strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as > fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is 10 to > 30% slower than Python 2 in gene

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-29 Thread Victor Stinner
Hi, >  * Use a Py_UCS4 buffer and then convert to the canonical form (ASCII, > UCS1 or UCS2). Approach taken by io.StringIO. io.StringIO is not only > used to write, but also to read and so a Py_UCS4 buffer is a good > compromise. >  * PyAccu API: optimized version of chunks=[]; for ...: ... > chu

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-04 Thread Serhiy Storchaka
04.05.12 02:45, Victor Stinner написав(ла): * Two steps: compute the length and maximum character of the output string, allocate the output string and then write characters. str%args was using it. * Optimistic approach. Start with a ASCII buffer, enlarge and widen (to UCS2 and then UCS4) the

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-03 Thread martin
Various notes: * PyUnicode_READ() is slower than reading a Py_UNICODE array. * Some decoders unroll the main loop to process 4 or 8 bytes (32 or 64 bits CPU) at each step. I am interested if you know other tricks to optimize Unicode strings in Python, or if you are interested to work on this to

[Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-03 Thread Victor Stinner
Hi, Different people are working on improving performances of Unicode strings in Python 3.3. This Python version is very different from Python 3.2 because of the PEP 393, and it is still unclear to me what is the best way to create a new Unicode string. There are different approachs: * Use the