Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Antoine Pitrou
On Thu, 14 Feb 2013 01:21:40 +0100 Victor Stinner wrote: > > UnicodeWriter (using the "writer += str" API) is the fastest method in > most cases, except for data = ['a'*10**4] * 10**2 (in this case, it's > 8x slower!). I guess that the overhead comes for the overallocation > which then require to

Re: [Python-Dev] [pypy-dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Steven D'Aprano
On 14/02/13 01:44, Nick Coghlan wrote: Deliberately *relying* on the += hack to avoid quadratic runtime is just plain wrong, and our documentation already says so. +1 I'm not sure that there's any evidence that people in general are *relying* on the += hack. More likely they write the first

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano
On 14/02/13 01:18, Chris Withers wrote: On 13/02/2013 11:53, Steven D'Aprano wrote: I fixed a performance bug in httplib some years ago by doing the exact opposite; += -> ''.join(). In that case, it changed downloading a file from 20 minutes to 3 seconds. That was likely on Python 2.5. I reme

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Victor Stinner
Hi, I wrote quick hack to expose _PyUnicodeWriter as _string.UnicodeWriter: http://www.haypocalc.com/tmp/string_unicode_writer.patch And I wrote a (micro-)benchmark: http://www.haypocalc.com/tmp/bench_join.py ( The benchmark uses only ASCII string, it would be interesting to test latin1, BMP and

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Christian Tismer
On 13.02.13 22:52, Maciej Fijalkowski wrote: On Wed, Feb 13, 2013 at 11:17 PM, Greg Ewing wrote: Steven D'Aprano wrote: The documentation for strings is also clear that you should not rely on this optimization: ... It can, and does, fail on CPython as well, as it is sensitive to memory alloc

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Christian Tismer
Hi Lennart, Sent from my Ei4Steve On Feb 13, 2013, at 8:42, Lennart Regebro wrote: >> Something is needed - a patch for PyPy or for the documentation I guess. > > Not arguing that it wouldn't be good, but I disagree that it is needed. > > This is only an issue when you, as in your proof, have

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 11:17 PM, Greg Ewing wrote: > Steven D'Aprano wrote: >> >> The documentation for strings is also clear that you should not rely on >> this >> optimization: >> >> ... > >> >> >> It >> can, and does, fail on CPython as well, as it is sensitive to memory >> allocation details.

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Greg Ewing
Steven D'Aprano wrote: The documentation for strings is also clear that you should not rely on this optimization: > ... > It can, and does, fail on CPython as well, as it is sensitive to memory allocation details. If it's that unreliable, why was it ever implemented in the first place? --

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Richard Oudkerk
On 13/02/2013 7:25pm, Antoine Pitrou wrote: I think resurrecting objects from __del__ is crazy, so IMO what you suggest is fine. You mean like subprocess.Popen.__del__? I quite agree. -- Richard ___ Python-Dev mailing list Python-Dev@python.org htt

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Barry Warsaw
On Feb 13, 2013, at 08:30 PM, Armin Rigo wrote: >Actually right now, at the exit of the interpreter, we just leave the >program without caring about running any __del__. This might mean >that in a short-running script no __del__ is ever run. I'd add this >question to your original list: is it go

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 4:02 PM, Amaury Forgeot d'Arc wrote: > 2013/2/13 Lennart Regebro >> >> On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc >> wrote: >> > Yes, it's jitted. >> >> Admittedly, I have no idea in which cases the JIT kicks in, and what I >> should do to make that happen to m

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 9:40 PM, Antoine Pitrou wrote: > On Wed, 13 Feb 2013 20:30:18 +0100 > Armin Rigo wrote: >> Hi, >> >> On Wed, Feb 13, 2013 at 8:22 PM, Maciej Fijalkowski wrote: >> > I think it's well documented you should not rely on stuff like that >> > being run at the exit of the inter

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 7:06 PM, Maciej Fijalkowski wrote: > I actually wonder. > > There seems to be the consensus to avoid += (to some extent). Can > someone commit the change to urrllib then? I'm talking about reverting > http://bugs.python.org/issue1285086 specifically That's unquoting of URL

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Antoine Pitrou
On Wed, 13 Feb 2013 20:30:18 +0100 Armin Rigo wrote: > Hi, > > On Wed, Feb 13, 2013 at 8:22 PM, Maciej Fijalkowski wrote: > > I think it's well documented you should not rely on stuff like that > > being run at the exit of the interpreter. > > Actually right now, at the exit of the interpreter,

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Armin Rigo
Hi, On Wed, Feb 13, 2013 at 8:22 PM, Maciej Fijalkowski wrote: > I think it's well documented you should not rely on stuff like that > being run at the exit of the interpreter. Actually right now, at the exit of the interpreter, we just leave the program without caring about running any __del__.

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Antoine Pitrou
On Wed, 13 Feb 2013 20:48:08 +0200 Maciej Fijalkowski wrote: > > Things were pypy differs: > > * finalizers in pypy will be called only once, even if the object is > resurrected. I'm not sure if this is detail or we're just plain > incompatible. I think this should be a detail. > * pypy breaks

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 9:09 PM, Xavier Morel wrote: > On 2013-02-13, at 19:48 , Maciej Fijalkowski wrote: > >> Hi >> >> I've tried (and failed) to find what GC details (especially finalizer >> semantics) are CPython only and which ones are not. The best I could >> find was the documentation of __

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread R. David Murray
On Wed, 13 Feb 2013 18:07:22 +0100, Christian Tismer wrote: > I think before getting people to work through long and > complete documentation, it is probably easier to wake their interest > by something like > "Hey, are you doing things this way?" > And then there is a short, concise list of bad

Re: [Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Xavier Morel
On 2013-02-13, at 19:48 , Maciej Fijalkowski wrote: > Hi > > I've tried (and failed) to find what GC details (especially finalizer > semantics) are CPython only and which ones are not. The best I could > find was the documentation of __del__ here: > http://docs.python.org/2/reference/datamodel.ht

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 13.02.13 20:40, Christian Tismer wrote: If += is anyway a bit slower than other ways, forget it. I would then maybe add a commend somewhere that says "avoiding '+=' because it is not reliable" or something. += is a fastest way (in any implementation) if you concatenates only two strings.

[Python-Dev] Marking GC details as CPython-only

2013-02-13 Thread Maciej Fijalkowski
Hi I've tried (and failed) to find what GC details (especially finalizer semantics) are CPython only and which ones are not. The best I could find was the documentation of __del__ here: http://docs.python.org/2/reference/datamodel.html Things were pypy differs: * finalizers in pypy will be calle

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Christian Tismer
On 13.02.13 19:06, Maciej Fijalkowski wrote: On Wed, Feb 13, 2013 at 7:33 PM, MRAB wrote: On 2013-02-13 13:23, Lennart Regebro wrote: On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka wrote: I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more than 3 and some of them are lit

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Brett Cannon
On Wed, Feb 13, 2013 at 1:27 PM, Maciej Fijalkowski wrote: > On Wed, Feb 13, 2013 at 8:24 PM, Brett Cannon wrote: > > > > > > > > On Wed, Feb 13, 2013 at 1:06 PM, Maciej Fijalkowski > > wrote: > >> > >> On Wed, Feb 13, 2013 at 7:33 PM, MRAB > wrote: > >> > On 2013-02-13 13:23, Lennart Regebro w

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 8:24 PM, Brett Cannon wrote: > > > > On Wed, Feb 13, 2013 at 1:06 PM, Maciej Fijalkowski > wrote: >> >> On Wed, Feb 13, 2013 at 7:33 PM, MRAB wrote: >> > On 2013-02-13 13:23, Lennart Regebro wrote: >> >> >> >> On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka >> >> wrote

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Brett Cannon
On Wed, Feb 13, 2013 at 1:06 PM, Maciej Fijalkowski wrote: > On Wed, Feb 13, 2013 at 7:33 PM, MRAB wrote: > > On 2013-02-13 13:23, Lennart Regebro wrote: > >> > >> On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka > >> wrote: > >>> > >>> I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's num

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 7:33 PM, MRAB wrote: > On 2013-02-13 13:23, Lennart Regebro wrote: >> >> On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka >> wrote: >>> >>> I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more >>> than 3 >>> and some of them are literal strings. >> >> >>

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread MRAB
On 2013-02-13 13:23, Lennart Regebro wrote: On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka wrote: I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more than 3 and some of them are literal strings. This has the benefit of being slow both on CPython and PyPy. Although using .

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Christian Tismer
Hey Nick, On 13.02.13 15:44, Nick Coghlan wrote: On Wed, Feb 13, 2013 at 10:06 PM, Christian Tismer wrote: To avoid such hidden traps in larger code bases, documentation is needed that clearly gives a warning saying "don't do that", like CS students learn for most other languages. How much mo

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Amaury Forgeot d'Arc
2013/2/13 Lennart Regebro > On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc > wrote: > > Yes, it's jitted. > > Admittedly, I have no idea in which cases the JIT kicks in, and what I > should do to make that happen to make sure I have the best possible > real-life test cases. > PyPy JIT ki

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 13.02.13 15:17, Daniel Holth wrote: On Wed, Feb 13, 2013 at 7:10 AM, Serhiy Storchaka mailto:storch...@gmail.com>> wrote: I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more than 3 and some of them are literal strings. Fixed: x = ('%s' * len(abcd)) % abcd No, you

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc wrote: > Yes, it's jitted. Admittedly, I have no idea in which cases the JIT kicks in, and what I should do to make that happen to make sure I have the best possible real-life test cases. //Lennart

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc wrote: > > 2013/2/13 Lennart Regebro >> >> On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka >> wrote: >> > I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more >> > than 3 >> > and some of them are literal strings. >> >> Thi

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Amaury Forgeot d'Arc
2013/2/13 Christian Tismer > On 13.02.13 15:27, Amaury Forgeot d'Arc wrote: > > > 2013/2/13 Lennart Regebro > >> On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka >> wrote: >> > I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more >> than 3 >> > and some of them are literal str

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Nick Coghlan
On Wed, Feb 13, 2013 at 10:06 PM, Christian Tismer wrote: > To avoid such hidden traps in larger code bases, documentation is > needed that clearly gives a warning saying "don't do that", like CS > students learn for most other languages. How much more explicit do you want us to be? """6. CPytho

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 13.02.13 15:23, Lennart Regebro wrote: > On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka wrote: >> I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more than 3 >> and some of them are literal strings. > > This has the benefit of being slow both on CPython and PyPy. Although >

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Christian Tismer
On 13.02.13 15:27, Amaury Forgeot d'Arc wrote: 2013/2/13 Lennart Regebro mailto:rege...@gmail.com>> On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka mailto:storch...@gmail.com>> wrote: > I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more than 3 > and some

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Amaury Forgeot d'Arc
2013/2/13 Lennart Regebro > On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka > wrote: > > I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more > than 3 > > and some of them are literal strings. > > This has the benefit of being slow both on CPython and PyPy. Although > using .f

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Chris Withers
On 13/02/2013 11:53, Steven D'Aprano wrote: I fixed a performance bug in httplib some years ago by doing the exact opposite; += -> ''.join(). In that case, it changed downloading a file from 20 minutes to 3 seconds. That was likely on Python 2.5. I remember it well. http://mail.python.org/pip

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Christian Tismer
On 13.02.13 14:17, Daniel Holth wrote: On Wed, Feb 13, 2013 at 7:10 AM, Serhiy Storchaka > wrote: On 13.02.13 10:52, Larry Hastings wrote: I've always hated the "".join(array) idiom for "fast" string concatenation--it's ugly and it flies in the fa

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka wrote: > I prefer "x = '%s%s%s%s' % (a, b, c, d)" when string's number is more than 3 > and some of them are literal strings. This has the benefit of being slow both on CPython and PyPy. Although using .format() is even slower. :-) //Lennart

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Daniel Holth
On Wed, Feb 13, 2013 at 7:10 AM, Serhiy Storchaka wrote: > On 13.02.13 10:52, Larry Hastings wrote: > >> I've always hated the "".join(array) idiom for "fast" string >> concatenation--it's ugly and it flies in the face of TOOWTDI. I think >> everyone should use "x = a + b + c + d" for string conc

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Christian Tismer
On 13.02.13 13:10, Steven D'Aprano wrote: On 13/02/13 10:53, Christian Tismer wrote: Hi friends, _efficient string concatenation_ has been a topic in 2004. Armin Rigo proposed a patch with the name of the subject, more precisely: /[Patches] [ python-Patches-980695 ] efficient string concatenat

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 13.02.13 10:52, Larry Hastings wrote: I've always hated the "".join(array) idiom for "fast" string concatenation--it's ugly and it flies in the face of TOOWTDI. I think everyone should use "x = a + b + c + d" for string concatenation, and we should just make that fast. I prefer "x = '%s%s%s

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Steven D'Aprano
On 13/02/13 10:53, Christian Tismer wrote: Hi friends, _efficient string concatenation_ has been a topic in 2004. Armin Rigo proposed a patch with the name of the subject, more precisely: /[Patches] [ python-Patches-980695 ] efficient string concatenation// //on sourceforge.net, on 2004-06-28./

Re: [Python-Dev] efficient string concatenation (yep, from 2004)

2013-02-13 Thread Christian Tismer
On 13.02.13 08:42, Lennart Regebro wrote: Something is needed - a patch for PyPy or for the documentation I guess. Not arguing that it wouldn't be good, but I disagree that it is needed. This is only an issue when you, as in your proof, have a loop that does concatenation. This is usually when

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano
On 13/02/13 22:46, Xavier Morel wrote: On 2013-02-13, at 12:37 , Steven D'Aprano wrote: # even less obvious than sum map(operator.add, array) That one does not work, it'll try to call the binary `add` with each item of the array when the map iterator is reified, erroring out. fu

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano
On 13/02/13 20:09, Chris Withers wrote: On 12/02/2013 21:03, Maciej Fijalkowski wrote: We recently encountered a performance issue in stdlib for pypy. It turned out that someone commited a performance "fix" that uses += for strings instead of "".join() that was there before. That's... interest

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Xavier Morel
On 2013-02-13, at 12:37 , Steven D'Aprano wrote: > ># even less obvious than sum >map(operator.add, array) That one does not work, it'll try to call the binary `add` with each item of the array when the map iterator is reified, erroring out. functools.reduce(operator.add, array, '')

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 13.02.13 09:52, Nick Coghlan wrote: On Wed, Feb 13, 2013 at 5:42 PM, Alexandre Vassalotti wrote: I don't think so. Ropes are really useful when you work with gigabytes of data, but unfortunately they don't make good general-purpose strings. Monolithic arrays are much more efficient and simpl

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano
On 13/02/13 19:52, Larry Hastings wrote: I've always hated the "".join(array) idiom for "fast" string concatenation --it's ugly and it flies in the face of TOOWTDI. I think everyone should use "x = a + b + c + d" for string concatenation, and we should just make that fast. "".join(array) is

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 12.02.13 23:03, Maciej Fijalkowski wrote: How people feel about generally not having += on long strings in stdlib (since the refcount = 1 thing is a hack)? Sometimes the use of += for strings or bytes is appropriate. For example, I deliberately used += for bytes instead b''.join() (note tha

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Antoine Pitrou
Le Wed, 13 Feb 2013 09:02:07 +0100, Victor Stinner a écrit : > I added a _PyUnicodeWriter internal API to optimize str%args and > str.format(args). It uses a buffer which is overallocated, so it's > basically like CPython str += str optimization. I still don't know how > efficient it is on Windows

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Chris Withers
On 12/02/2013 21:03, Maciej Fijalkowski wrote: We recently encountered a performance issue in stdlib for pypy. It turned out that someone commited a performance "fix" that uses += for strings instead of "".join() that was there before. That's... interesting. I fixed a performance bug in httpli

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Larry Hastings
On 02/12/2013 05:25 PM, Christian Tismer wrote: Ropes have been implemented by Carl-Friedrich Bolz in 2007 as I remember. No idea what the impact was, if any at all. Would ropes be an answer (and a simple way to cope with string mutation patterns) as an alternative implementation, and therefore s

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Tue, Feb 12, 2013 at 10:03 PM, Maciej Fijalkowski wrote: > Hi > > We recently encountered a performance issue in stdlib for pypy. It > turned out that someone commited a performance "fix" that uses += for > strings instead of "".join() that was there before. Can someone show the actual diff? O

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 10:02 AM, Victor Stinner wrote: > I added a _PyUnicodeWriter internal API to optimize str%args and > str.format(args). It uses a buffer which is overallocated, so it's > basically like CPython str += str optimization. I still don't know how > efficient it is on Windows, sin

Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Victor Stinner
I added a _PyUnicodeWriter internal API to optimize str%args and str.format(args). It uses a buffer which is overallocated, so it's basically like CPython str += str optimization. I still don't know how efficient it is on Windows, since realloc() is slow on Windows (at least on old Windows versions