Maybe this code will be faster? (If it even does the same thing:
largely untested)
filehandle = open("data",'r',buffering=1000)
fileIter = iter(filehandle)
lastLine = fileIter.next()
lastTokens = lastLine.strip().split(delimiter)
lastGeno = extract(lastTokens[0])
for currentLine in fileIter:
currentTokens = currentLine.strip().split(delimiter)
currentGeno = extract(currentTokens[0])
if lastGeno == currentGeno:
table.markEquivalent(int(lastTokens[1]),int(currentTokens[1]))
# prepare for next iteration
lastLine = currentLine
lastTokens = currentTokens
lastGeno = currentGeno
I'd be tempted to try a bigger file buffer too, personally.
--
Ben Sizer
--
http://mail.python.org/mailman/listinfo/python-list