Shane O'Connor wrote:
Hi,

First-time poster here. I've a question about loop efficiency - I was
wondering whether this code:

i = 0
while i < 1000:
    do something
    i+=1

is more efficient than:

for i in range(1000):
    do something

or:

for i in xrange(1000):
    do something


You can test this with the timeit module. xrange is about 50% faster than range for a million items, due to the overhead of building the full list:

[steve@sylar ~]$ python -m timeit "for n in range(1000000): pass"
10 loops, best of 3: 156 msec per loop
[steve@sylar ~]$ python -m timeit "for n in xrange(1000000): pass"
10 loops, best of 3: 106 msec per loop

(The "10 loops" part means that timeit repeats the entire code snippet ten times, in order to minimize the effect of other processes and background tasks.)

However, if you move building the list into setup code, which doesn't get timed, you will see that there's hardly any difference in time for iterating over a list and an xrange object:

[steve@sylar ~]$ python -m timeit -s "items = range(1000000)" "for n in items: pass"
10 loops, best of 3: 96.1 msec per loop
[steve@sylar ~]$ python -m timeit -s "items = xrange(1000000)" "for n in items: pass"
10 loops, best of 3: 99.3 msec per loop


As for the while loop, it doesn't even come close!

[steve@sylar ~]$ python -m timeit "n = 0
> while n < 1000000: n += 1"
10 loops, best of 3: 282 msec per loop


The problem is that in a while loop, the looping itself is handled in slow Python code, while in the for loop, it's handled by fast C code (or whatever language the Python virtual machine is written in).

We can see this by disassembling the code into virtual machine instructions:


[steve@sylar ~]$ python
Python 2.5 (r25:51908, Nov  6 2007, 16:54:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from dis import dis
>>> dis(compile("for x in xrange(1000000): pass", "", "exec"))
  1           0 SETUP_LOOP              20 (to 23)
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               0 (1000000)
              9 CALL_FUNCTION            1
             12 GET_ITER
        >>   13 FOR_ITER                 6 (to 22)
             16 STORE_NAME               1 (x)
             19 JUMP_ABSOLUTE           13
        >>   22 POP_BLOCK
        >>   23 LOAD_CONST               1 (None)
             26 RETURN_VALUE

The for-loop is three virtual instructions: FOR_ITER, STORE_NAME and JUMP_ABSOLUTE. Compare that to the while loop:

>>> dis(compile("n = 0\nwhile n < 1000000: n += 1", "", "exec"))
  1           0 LOAD_CONST               0 (0)
              3 STORE_NAME               0 (n)

  2           6 SETUP_LOOP              28 (to 37)
        >>    9 LOAD_NAME                0 (n)
             12 LOAD_CONST               1 (1000000)
             15 COMPARE_OP               0 (<)
             18 JUMP_IF_FALSE           14 (to 35)
             21 POP_TOP
             22 LOAD_NAME                0 (n)
             25 LOAD_CONST               2 (1)
             28 INPLACE_ADD
             29 STORE_NAME               0 (n)
             32 JUMP_ABSOLUTE            9
        >>   35 POP_TOP
             36 POP_BLOCK
        >>   37 LOAD_CONST               3 (None)
             40 RETURN_VALUE


So the for loop pushes more of the actual effort into the underlying engine, which is faster.

Of course, in real-life code, most of this is just theoretical. I see from your comment below that you understand this, but for the benefit of others reading, most of the time, the overhead of the loop body will be much bigger than the loop itself. Who cares whether a loop takes 30 seconds + 100 milliseconds, or 30 seconds + 150 milliseconds? You aren't going to notice the difference.

The general advice is best summed up with a quote from the famous computer scientist Donald Knuth:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."

http://en.wikipedia.org/wiki/Program_optimization

Other quotes of note:

"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity." - W.A. Wulf

"The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet." - Michael A. Jackson


In my mind, the while loop should not allocate as much memory as range or
have the overhead of the iterator of xrange (although aesthetically, I
prefer the x/range style). Is this the case or does the compiler do
something clever here?

As a general rule, Python's compiler rarely does anything clever. "Clever" optimizations are remarkable for the number of difficult to notice and even more difficult to fix bugs they introduce.

Nevertheless, the PyPy project is experimenting with optimizing just-in-time compilers, and so far are already running about twice as fast as the standard CPython compiler. Similarly, the experimental WPython compiler also does a lot more optimizations than CPython.

If you're interested in optimization, PyPy is good to look at: for a handful of carefully selected examples, Python code in PyPy is faster than optimized C code!

http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-carefully-crafted.html


In particular, I'm using Python 2.4.3 on a web server which needs to run as
fast as possible using as little memory as possible (no surprises there!).
I'm aware that there are more significant optimizations than the above and I
will profile the code rather than prematurely optimize loops at the sake of
readability/errors but I'm still curious about the answer.





--
Steven

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to