On 31/07/15 15:39, ltc.hots...@gmail.com wrote:

fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
     if not line.startswith('From'): continue
     line2 = line.strip()
     line3 = line2.split()
     line4 = line3[1]
     addresses = set()

Notice I said you had to create and initialize the set
*above* the loop.
Here you are creating a new set every time round the
loop and throwing away the old one.

     addresses.add(line4)
     count = count + 1
     print addresses

And notice I said to move the print statement
to *after* the loop so as to print the complete set,
not just the current status.

print "There were", count, "lines in the file with From as the first word"

The code produces the following out put:

In [15]: %run _8_5_v_13.py
Enter file name: mbox-short.txt
set(['stephen.marqu...@uct.ac.za'])
set(['stephen.marqu...@uct.ac.za'])
set(['lo...@media.berkeley.edu'])

Thats correct because you create a new set each time
and add precisely one element to it before throwing
it away and starting over next time round.

Question no. 1: is there a build in function for set that parses the data for 
duplicates.

No because thats what a set does. it is a collection of
unique items. It will not allow duplicates.

Your problem is you create a new set of one item for
every line. So you have multiple sets with the same
data in them.

  Question no. 2: Why is there not a building function for append?

add() is the equivalent of append for a set.
If you try to add() a value that already exists it
will be ignored.

Question no. 3: If all else fails, i.e., append & set,  my only option is the 
slice the data set?

No there are lots of other options but none of them are necessary because a set is a collection of unique values. You just need to
use it properly. Read my instructions again, carefully:

You do that by first creating an empty set above
the loop, let's call it addresses:

addresses = set()

Then replace your print statement with the set add()
method:

addresses.add(line4)

This means that at the end of your loop you will have
a set containing all of the unique addresses you found.
You now print the set.



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to