from:"Sammo"

String concatenation performance with +=

2009-02-13 Thread Sammo

String concatenation has been optimized since 2.3, so using += should
be fairly fast.

In my first test, I tried concatentating a 4096 byte string 1000 times
in the following code, and the result was indeed very fast (12.352 ms
on my machine).

import time
t = time.time()
mydata = ""
moredata = "A"*4096
for i in range(1000):
mydata += moredata # 12.352 ms
print "%0.3f ms"%(1000*(time.time() - t))

However, I got a different result in my second test, which is
implemented in a class with a feed() method. This test took 4653.522
ms on my machine, which is 350x slower than the previous test!

class StringConcatTest:
def __init__(self):
self.mydata = ""

def feed(self, moredata):
self.mydata += moredata # 4653.522 ms

test = StringConcatTest()
t = time.time()
for i in range(1000):
test.feed(moredata)
print "%0.3f ms"%(1000*(time.time() - t))

Note that I need to do something to mydata INSIDE the loop, so please
don't tell me to append moredata to a list and then use "".join after
the loop.

Why is the second test so much slower?
--
http://mail.python.org/mailman/listinfo/python-list

Re: String concatenation performance with +=

2009-02-13 Thread Sammo

Okay, this is what I have tried for string concatenation:

1. Using += implemented using simple operations (12 ms)
2. Using += implemented inside a class (4000+ ms)
3. Using "".join implemented using simple operations (4000+ ms)
4. Using "".join implemented inside a class (4000+ ms)

On Feb 14, 3:12 pm, Benjamin Peterson  wrote:
> Sammo  gmail.com> writes:
>
> > String concatenation has been optimized since 2.3, so using += should
> > be fairly fast.
>
> This is implementation dependent and shouldn't be relied upon.
>
> > Note that I need to do something to mydata INSIDE the loop, so please
> > don't tell me to append moredata to a list and then use "".join after
> > the loop.
>
> Then why not just mutate the list and then call "".join?

AFAIK, using list mutation and "".join only improves performance if
the "".join is executed outside of the loop. In fact, in Python 2.5.2,
using "".join inside the loop is actually much slower compared to my
original test, which concatenates using +=.

My original test with simple operations took 12 ms to run:

import time
t = time.time()
mydata = ""
moredata = "A"*4096
for i in range(1000):
mydata += moredata # 12.352 ms
# do some stuff to mydata
# ...
print "%0.3f ms"%(1000*(time.time() - t))

New code modified to mutate the list, then call "".join now takes 4417
ms to run. This is much slower!

import time
t = time.time()
mydata = []
moredata = "A"*4096
for i in range(1000):
mydata.append(moredata)
mydata = ["".join(mydata)]
# do some stuff to mydata
# ...

Using list mutation and "".join, implemented in a class. This took
4434 ms to run, which is again much slower than the original test.
Note that it is about the same speed as using += implemented in a
class.

import time
moredata = "A"*4096
class StringConcatTest:
def __init__(self):
self.mydata = []

def feed(self, moredata):
self.mydata.append(moredata)
self.mydata = ["".join(self.mydata)]
# do some stuff to self.mydata
# ...

test = StringConcatTest()
t = time.time()
for i in range(1000):
test.feed(moredata)
print "%0.3f ms"%(1000*(time.time() - t))

> > Why is the second test so much slower?
>
> Probably several reasons:
>
> 1. Function call overhead is quite large compared to these simple operations.
> 2. You are resolving attribute names.

The main problem I need help with is how to improve the performance of
the code implemented in a class. It is currently 350x slower than the
first test using simple operations, so surely there's got to be a way.
--
http://mail.python.org/mailman/listinfo/python-list

Re: String concatenation performance with +=

2009-02-14 Thread Sammo

On Feb 14, 4:47 pm, Steven D'Aprano  wrote:
> > Sammo  gmail.com> writes:
> >> String concatenation has been optimized since 2.3, so using += should
> >> be fairly fast.
>
> > This is implementation dependent and shouldn't be relied upon.
>
> It's also a fairly simple optimization and really only applies to direct
> object access, not items or attributes.
>
> >> Why is the second test so much slower?
>
> > Probably several reasons:
>
> > 1. Function call overhead is quite large compared to these simple
> > operations. 2. You are resolving attribute names.
>
> 3. Because the optimization isn't applied in this case.

Thanks Steven -- that's the answer I was looking for. The += string
concatenation optimization only applies to local variables. In my
case, I incorrectly assumed it applied to attributes.

Can you point me to any references regarding the limitations of the
optimization? I guess it's another Python idiom that's documented
somewhere ...
--
http://mail.python.org/mailman/listinfo/python-list

Re: String concatenation performance with +=

2009-02-14 Thread Sammo

On Feb 14, 5:33 pm, Steven D'Aprano  wrote:
> > AFAIK, using list mutation and "".join only improves performance if
> > the "".join is executed outside of the loop.
>
> Naturally. If you needlessly join over and over again, instead of delaying
> until the end, then you might as well do string concatenation without the
> optimization.
>
> join() isn't some magical function that makes your bugs go away. You have to
> use it sensibly. What you've done is a variant of Shlemiel the
> road-painter's algorithm:
>
> http://www.joelonsoftware.com/articles/fog000319.html
>
> Perhaps you have to repeatedly do work on the *temporary* results in between
> loops? Something like this toy example?
>
> s = ""
> block = "abcdefghijklmnopqrstuvwxyz"*1000
> for i in xrange(1000):
>     s += block
>     # do something with the partially concatenated string
>     print "partial length is", len(s)
> # all finished
> do_something(s)
>
> You've written that using join:
>
> L = []
> block = "abcdefghijklmnopqrstuvwxyz"*1000
> for i in xrange(1000):
>     L.append(block)
>     # do something with the partially concatenated string
>     L = [ ''.join(L) ]
>     print "partial length is", len(L[0])
> # all finished
> s = ''.join(L)
> do_something(s)
>
> Okay, but we can re-write that more sensibly:
>
> L = []
> block = "abcdefghijklmnopqrstuvwxyz"*1000
> for i in xrange(1000):
>     L.append(block)
>     # do something with the partially concatenated string
>     print "partial length is", sum(len(item) for item in L)
> # all finished
> s = ''.join(L)
> do_something(s)
>
> There's still a Shlemiel algorithm in there, but it's executed in fast C
> instead of slow Python and it doesn't involve copying large blocks of
> memory, only walking them, so you won't notice as much. Can we be smarter?
>
> L = []
> block = "abcdefghijklmnopqrstuvwxyz"*1000
> partial_length = 0
> for i in xrange(1000):
>     L.append(block)
>     partial_length += len(block)
>     # do something with the partially concatenated string
>     print "partial length is", partial_length
> # all finished
> s = ''.join(L)
> do_something(s)
>
> Naturally this is a toy example, but I think it illustrates one technique
> for avoiding turning an O(N) algorithm into an O(N**2) algorithm.

So even though I can't delay the "".join operation until after the
loop, I can still improve performance by reducing the length of
"".join operations inside the loop. In my case, I need to temporarily
work on the latest two blocks only.

L = []
block = "abcdefghijklmnopqrstuvwxyz"*1000
for i in xrange(1000):
L.append(block)
# do something with the latest two blocks
tmp = "".join(L[-2:])
# all finished
s = ''.join(L)
do_something(s)

Unfortunately, the downside is that the code becomes more difficult to
read compared to using the obvious +=. If only the optimization worked
for attributes ...
--
http://mail.python.org/mailman/listinfo/python-list

What's so wrong about execfile?

2009-02-27 Thread Sammo

Given that execfile has been removed in py3k, I want to understand
exactly why.

Okay, I get that execfile is bad from the following thread:

On Jul 29 2007, 2:39 pm, Steven D'Aprano
 wrote:
> (1) Don't use eval, exec or execfile.
>
> (2) If you're an expert, don't use eval, exec or execfile.
>
> (3) If you're an expert, and are fully aware of the security risks, don't
> use eval, exec or execfile.
>
> (4) If you're an expert, and are fully aware of the security risks, and
> have a task that can only be solved by using eval, exec or execfile, find
> another solution.
>
> (5) If there really is no other solution, you haven't looked hard enough.
>
> (6) If you've looked REALLY hard, and can't find another solution, AND
> you're an expert and are fully aware of the security risks, THEN you can
> think about using eval, exec or execfile.

What are some of the reasons why execfile should not be used?

What are some examples of cases where execfile is the correct way of
doing something?
--
http://mail.python.org/mailman/listinfo/python-list

String concatenation performance with +=

Re: String concatenation performance with +=

Re: String concatenation performance with +=

Re: String concatenation performance with +=

What's so wrong about execfile?

5 matches

Site Navigation

Mail list logo

Footer information