Prasad, Ramit wrote:
But more importantly, some years ago (Python 2.4, about 8 years ago?) the
Python developers found a really neat trick that they can do to optimize
string concatenation so it doesn't need to repeatedly copy characters over
and
over and over again. I won't go into details, but the thing is, this trick
works well enough that repeated concatenation is about as fast as the join
method MOST of the time.

I would like to learn a bit more about the trick if you have a reference handy. I have no intention of using it, but it sounds interesting and might teach me more about Python internals.

In a nutshell, CPython identifies cases like:

mystr = mystr + otherstr
mystr += otherstr

where mystr is not used in any other place, and if possible, resizes mystr in place and appends otherstr, rather than copying both to a new string object.

The "if possible" hides a lot of technical detail, which is why the optimization can fail on some platforms while working on others. See this painful discussion trying to debug httplib slowness:

http://mail.python.org/pipermail/python-dev/2009-August/091125.html

After many dead-ends and red herrings, somebody spotted the problem:

http://mail.python.org/pipermail/python-dev/2009-September/091582.html

ending with GvR admitting that it was an embarrassment that repeated string concatenation had survived in the standard library for so long. The author even knew it was slow because he put a comment warning about it!

Here is the middle of the discussion adding the optimization back in 2004:

http://mail.python.org/pipermail/python-dev/2004-August/046695.html

which talks about the possibility of other implementations doing something similar. You can find the beginning of the discussion yourself :)

And here is a good description of the optimization itself:

http://utcc.utoronto.ca/~cks/space/blog/python/ExaminingStringConcatOpt




--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to