On 31/07/15 15:39, ltc.hots...@gmail.com wrote:
fname = raw_input("Enter file name: ") if len(fname) < 1 : fname = "mbox-short.txt" fh = open(fname) count = 0 for line in fh: if not line.startswith('From'): continue line2 = line.strip() line3 = line2.split() line4 = line3[1] addresses = set()
Notice I said you had to create and initialize the set *above* the loop. Here you are creating a new set every time round the loop and throwing away the old one.
addresses.add(line4) count = count + 1 print addresses
And notice I said to move the print statement to *after* the loop so as to print the complete set, not just the current status.
print "There were", count, "lines in the file with From as the first word" The code produces the following out put: In [15]: %run _8_5_v_13.py Enter file name: mbox-short.txt set(['stephen.marqu...@uct.ac.za']) set(['stephen.marqu...@uct.ac.za']) set(['lo...@media.berkeley.edu'])
Thats correct because you create a new set each time and add precisely one element to it before throwing it away and starting over next time round.
Question no. 1: is there a build in function for set that parses the data for duplicates.
No because thats what a set does. it is a collection of unique items. It will not allow duplicates. Your problem is you create a new set of one item for every line. So you have multiple sets with the same data in them.
Question no. 2: Why is there not a building function for append?
add() is the equivalent of append for a set. If you try to add() a value that already exists it will be ignored.
Question no. 3: If all else fails, i.e., append & set, my only option is the slice the data set?
No there are lots of other options but none of them are necessary because a set is a collection of unique values. You just need to
use it properly. Read my instructions again, carefully:
You do that by first creating an empty set above the loop, let's call it addresses: addresses = set() Then replace your print statement with the set add() method: addresses.add(line4) This means that at the end of your loop you will have a set containing all of the unique addresses you found. You now print the set.
-- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor