"r.e.s." <[EMAIL PROTECTED]> writes:
> I have a million-line text file with 100 characters per line,
> and simply need to determine how many of the lines are distinct.
I'd generalise it by allowing the caller to pass any iterable set of
items. A file handle can be iterated this way, but so can any
sequence or iterable.
def count_distinct(seq):
""" Count the number of distinct items """
counts = dict()
for item in seq:
if not item in counts:
counts[item] = 0
counts[item] += 1
return len(counts)
>>> infile = file('foo.txt')
>>> for line in file('foo.txt'):
... print line,
...
abc
def
ghi
abc
ghi
def
xyz
abc
abc
def
>>> infile = file('foo.txt')
>>> print count_distinct(infile)
5
--
\ "A man may be a fool and not know it -- but not if he is |
`\ married." -- Henry L. Mencken |
_o__) |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list