[Tutor] Any suggestions for optimizing this code?

grouchy Sun, 18 Sep 2005 21:44:34 -0700

Hello again!

I've been playing with generating 1-D cellular automata, and this is the fastest solution I've come up with so far.

In the makeArray() function below, I have a sliding window of three elements starting at row[-1,0,1] and going to row[-2,-1,0], so that it wraps around at either boundary. I use the results of 4*row[i-1] + row[i] + row[i+1] to convert the three bits to an integer, and fetch a result from rule[index]. The inner loop is simple, and executes a million times, so shaving any time off makes a big difference.

The biggest speedup from things I tried came from making an empty matrix first, to put results into the next row by index, instead of creating new rows on the fly with append. And binding the result[row] and result[row+1] references before the loop. Those two things sped it up by a consistentt 40%. Thanks to Python in a Nutshell!

Does anybody see a faster approach, or a way to optimize the inner loop on makeArray() further? With the current approach, the inner loop does the real work, as it steps over each cell one at a time.

For anybody not familiar with cellular automatas, there is a pretty java animation of how cells get calculated here: http://math.hws.edu/xJava/CA/CA.html

And this is what CAs they look like when you plot them, with 1s as black and 0s as white, and some additional info: http://mathworld.wolfram.com/Rule30.html

I've just hardcoded rule 30 into this for testing. I make a blank matrix with makeMatrix(). Then makeArray(matrix) sets the middle element of the first row in the matrix to 1 to seed the CA, and loops through it two rows at a time, calculating the results of the first row, and putting them into the next row.

Right now I get:

>>> timeArray(5)
Total time: 3.95693028

I am uneasy with the algorithm getting each element one at a time, throwing them away, and getting two that overlap in the next step of the window, but I couldn't come up with a more elegant solution. Also, it seems like a kludgey way to convert the 3 bits into a binary number, but it was the fastest way I stumbled on to.

w = 1000
h = 1000
rule = [0,1,1,1,1,0,0,0]

def makeMatrix(w,h):
    result = [0]*h
    for i in range(h):
        result[i] = [0]*w
    return result

def makeArray(matrix):
    result = matrix
    result[0][w/2] = 1

    for row in range(h-1):
        last = result[row]
        next = result[row+1]
        for i in range(w-1):
            next[i] = rule[4*last[i-1]+2*last[i]+last[i+1]]
        next[i+1] = rule[4*last[i]+2*last[i+1]+last[0]]
    return result

def timeArray(num):
    import time
    matrix = makeMatrix(w,h)
    t1 = time.clock()
    for i in range(num):
        makeArray(matrix)
    t2 = time.clock()
    print "Total time:", t2 - t1

timeArray(5)

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Any suggestions for optimizing this code?

Reply via email to