I have a million-line text file with 100 characters per line, and simply need to determine how many of the lines are distinct.
On my PC, this little program just goes to never-never land:
def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)
Would anyone care to point out improvements?
Is there a better algorithm for doing this?
--
http://mail.python.org/mailman/listinfo/python-list
