String concatenation performance with +=
String concatenation has been optimized since 2.3, so using += should be fairly fast. In my first test, I tried concatentating a 4096 byte string 1000 times in the following code, and the result was indeed very fast (12.352 ms on my machine). import time t = time.time() mydata = "" moredata = "A"*4096 for i in range(1000): mydata += moredata # 12.352 ms print "%0.3f ms"%(1000*(time.time() - t)) However, I got a different result in my second test, which is implemented in a class with a feed() method. This test took 4653.522 ms on my machine, which is 350x slower than the previous test! class StringConcatTest: def __init__(self): self.mydata = "" def feed(self, moredata): self.mydata += moredata # 4653.522 ms test = StringConcatTest() t = time.time() for i in range(1000): test.feed(moredata) print "%0.3f ms"%(1000*(time.time() - t)) Note that I need to do something to mydata INSIDE the loop, so please don't tell me to append moredata to a list and then use "".join after the loop. Why is the second test so much slower? -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation performance with +=
Okay, this is what I have tried for string concatenation: 1. Using += implemented using simple operations (12 ms) 2. Using += implemented inside a class (4000+ ms) 3. Using "".join implemented using simple operations (4000+ ms) 4. Using "".join implemented inside a class (4000+ ms) On Feb 14, 3:12 pm, Benjamin Peterson wrote: > Sammo gmail.com> writes: > > > String concatenation has been optimized since 2.3, so using += should > > be fairly fast. > > This is implementation dependent and shouldn't be relied upon. > > > Note that I need to do something to mydata INSIDE the loop, so please > > don't tell me to append moredata to a list and then use "".join after > > the loop. > > Then why not just mutate the list and then call "".join? AFAIK, using list mutation and "".join only improves performance if the "".join is executed outside of the loop. In fact, in Python 2.5.2, using "".join inside the loop is actually much slower compared to my original test, which concatenates using +=. My original test with simple operations took 12 ms to run: import time t = time.time() mydata = "" moredata = "A"*4096 for i in range(1000): mydata += moredata # 12.352 ms # do some stuff to mydata # ... print "%0.3f ms"%(1000*(time.time() - t)) New code modified to mutate the list, then call "".join now takes 4417 ms to run. This is much slower! import time t = time.time() mydata = [] moredata = "A"*4096 for i in range(1000): mydata.append(moredata) mydata = ["".join(mydata)] # do some stuff to mydata # ... Using list mutation and "".join, implemented in a class. This took 4434 ms to run, which is again much slower than the original test. Note that it is about the same speed as using += implemented in a class. import time moredata = "A"*4096 class StringConcatTest: def __init__(self): self.mydata = [] def feed(self, moredata): self.mydata.append(moredata) self.mydata = ["".join(self.mydata)] # do some stuff to self.mydata # ... test = StringConcatTest() t = time.time() for i in range(1000): test.feed(moredata) print "%0.3f ms"%(1000*(time.time() - t)) > > Why is the second test so much slower? > > Probably several reasons: > > 1. Function call overhead is quite large compared to these simple operations. > 2. You are resolving attribute names. The main problem I need help with is how to improve the performance of the code implemented in a class. It is currently 350x slower than the first test using simple operations, so surely there's got to be a way. -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation performance with +=
On Feb 14, 4:47 pm, Steven D'Aprano wrote: > > Sammo gmail.com> writes: > >> String concatenation has been optimized since 2.3, so using += should > >> be fairly fast. > > > This is implementation dependent and shouldn't be relied upon. > > It's also a fairly simple optimization and really only applies to direct > object access, not items or attributes. > > >> Why is the second test so much slower? > > > Probably several reasons: > > > 1. Function call overhead is quite large compared to these simple > > operations. 2. You are resolving attribute names. > > 3. Because the optimization isn't applied in this case. Thanks Steven -- that's the answer I was looking for. The += string concatenation optimization only applies to local variables. In my case, I incorrectly assumed it applied to attributes. Can you point me to any references regarding the limitations of the optimization? I guess it's another Python idiom that's documented somewhere ... -- http://mail.python.org/mailman/listinfo/python-list
Re: String concatenation performance with +=
On Feb 14, 5:33 pm, Steven D'Aprano wrote: > > AFAIK, using list mutation and "".join only improves performance if > > the "".join is executed outside of the loop. > > Naturally. If you needlessly join over and over again, instead of delaying > until the end, then you might as well do string concatenation without the > optimization. > > join() isn't some magical function that makes your bugs go away. You have to > use it sensibly. What you've done is a variant of Shlemiel the > road-painter's algorithm: > > http://www.joelonsoftware.com/articles/fog000319.html > > Perhaps you have to repeatedly do work on the *temporary* results in between > loops? Something like this toy example? > > s = "" > block = "abcdefghijklmnopqrstuvwxyz"*1000 > for i in xrange(1000): > s += block > # do something with the partially concatenated string > print "partial length is", len(s) > # all finished > do_something(s) > > You've written that using join: > > L = [] > block = "abcdefghijklmnopqrstuvwxyz"*1000 > for i in xrange(1000): > L.append(block) > # do something with the partially concatenated string > L = [ ''.join(L) ] > print "partial length is", len(L[0]) > # all finished > s = ''.join(L) > do_something(s) > > Okay, but we can re-write that more sensibly: > > L = [] > block = "abcdefghijklmnopqrstuvwxyz"*1000 > for i in xrange(1000): > L.append(block) > # do something with the partially concatenated string > print "partial length is", sum(len(item) for item in L) > # all finished > s = ''.join(L) > do_something(s) > > There's still a Shlemiel algorithm in there, but it's executed in fast C > instead of slow Python and it doesn't involve copying large blocks of > memory, only walking them, so you won't notice as much. Can we be smarter? > > L = [] > block = "abcdefghijklmnopqrstuvwxyz"*1000 > partial_length = 0 > for i in xrange(1000): > L.append(block) > partial_length += len(block) > # do something with the partially concatenated string > print "partial length is", partial_length > # all finished > s = ''.join(L) > do_something(s) > > Naturally this is a toy example, but I think it illustrates one technique > for avoiding turning an O(N) algorithm into an O(N**2) algorithm. So even though I can't delay the "".join operation until after the loop, I can still improve performance by reducing the length of "".join operations inside the loop. In my case, I need to temporarily work on the latest two blocks only. L = [] block = "abcdefghijklmnopqrstuvwxyz"*1000 for i in xrange(1000): L.append(block) # do something with the latest two blocks tmp = "".join(L[-2:]) # all finished s = ''.join(L) do_something(s) Unfortunately, the downside is that the code becomes more difficult to read compared to using the obvious +=. If only the optimization worked for attributes ... -- http://mail.python.org/mailman/listinfo/python-list
What's so wrong about execfile?
Given that execfile has been removed in py3k, I want to understand exactly why. Okay, I get that execfile is bad from the following thread: On Jul 29 2007, 2:39 pm, Steven D'Aprano wrote: > (1) Don't use eval, exec or execfile. > > (2) If you're an expert, don't use eval, exec or execfile. > > (3) If you're an expert, and are fully aware of the security risks, don't > use eval, exec or execfile. > > (4) If you're an expert, and are fully aware of the security risks, and > have a task that can only be solved by using eval, exec or execfile, find > another solution. > > (5) If there really is no other solution, you haven't looked hard enough. > > (6) If you've looked REALLY hard, and can't find another solution, AND > you're an expert and are fully aware of the security risks, THEN you can > think about using eval, exec or execfile. What are some of the reasons why execfile should not be used? What are some examples of cases where execfile is the correct way of doing something? -- http://mail.python.org/mailman/listinfo/python-list
