Brian van den Broek wrote:
Wolfram Kraus said unto the world upon 2005-01-27 03:24:
<SNIP>
This whole part can be rewritten (without sorting, but in Py2.4 you can use sorted() for this) with a list comprehension (Old Python2.1 style, with a newer version the keys() aren't needed):
for k,v in [(k, items_dict[k]) \
for k in items_dict.keys() if items_dict[k] > 1]:
print '%s occurred %s times' %(key, items_dict[key])
I think it is clearer to filter the list as it is printed. And dict.iteritems() is handy here, too.
for k, v in items_dict.iteritems(): if v > 1: print '%s occurred %s times' % (k, v)
Kent
Hi all,
incorporating some of Wolfram's and Kent's (hope I've missed no one) suggestions:
<code> def dups_in_list_report(a_list): '''Prints a duplication report for a list.'''
items_dict = {}
for i in a_list: items_dict[i] = items_dict.get(i, 0) + 1
for k, v in sorted(items_dict.iteritems()): # cf below if v > 1: print '%s occurred %s times' %(k, v) </code>
And, I can't but agree that this is much better! Thanks folks.
In trying to improve the code, I first had:
for key in sorted(items_dict.keys()): if items_dict[key] > 1: print '%s occurred %s times' %(key, items_dict[key])
in place of the for loop over .iteritems(). Am I right in thinking that the advantage of Kent's suggestion of .iteritems() is that it eliminates some of the dict lookups? Other advantages?
Finally, in the first instance, I was aiming for the OP's stated end. To make this more general and reusable, I think I'd do:
<code> def get_list_dup_dict(a_list, threshold=1): '''Returns a dict of items in list that occur threshold many times
threshold defaults to 1. The dict returned has items occurring at least
threshold many times as keys, and number of occurrences as values.
'''
items_dict, dup_dict = {}, {} # Question below
for i in a_list: items_dict[i] = items_dict.get(i, 0) + 1
for k, v in items_dict.iteritems(): if v >= threshold: dup_dict[k] = v #Question below
return dup_dict
def print_list_dup_report(a_list, threshold=1):
'''Prints report of items in a_list occurring at least threshold many times
threshold defaults to 1. get_list_dup_dict(a_list, threshold=0) is called.
returning a dict of items in list that occur at least threshold many times
as keys and their number of repetitions as values.
This dict is looped over to print a sorted and formatted duplication report. '''
dup_dict = get_list_dup_dict(a_list, threshold) for k, v in sorted(dup_dict.iteritems()): print '%s occurred %s times' %(k, v) </code>
My question (from comment in new code):
Since I've split the task into two functions, one to return a duplication dictionary, the other to print a report based on it, I think the distinct dup_dict is needed. (I do want the get_list_dup_dict function to return a dict for possible use in other contexts.)
The alternative would be to iterate over a .copy() of the items_dict and delete items not meeting the threshold from items_dict, returning the pruned items_dict at the end. But, dup_dict is guaranteed to be smaller, save for original lists with no duplications and threshold set to 1. So, the distinct dup_dict way seems better for memory.
Am I overlooking yet another dict technique that would help, here? Any other improvements?
Thanks and best to all,
Brian vdB
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor