Hi Emile, I made a mistake and incorrectly assumed that differences between 54 lines of output and 27 lines of output is the result of removing duplicate email addresses, i.e., gsil...@umich.edu gsil...@umich.edu, c...@iupui.edu, c...@iupui.edu
Apparently, this is not the case and I was wrong :( The solution to the problem is in the desired line output: stephen.marqu...@uct.ac.za lo...@media.berkeley.edu zq...@umich.edu rjl...@iupui.edu zq...@umich.edu rjl...@iupui.edu c...@iupui.edu c...@iupui.edu gsil...@umich.edu gsil...@umich.edu zq...@umich.edu gsil...@umich.edu wagne...@iupui.edu zq...@umich.edu antra...@caret.cam.ac.uk gopal.ramasammyc...@gmail.com david.horw...@uct.ac.za david.horw...@uct.ac.za david.horw...@uct.ac.za david.horw...@uct.ac.za stephen.marqu...@uct.ac.za lo...@media.berkeley.edu lo...@media.berkeley.edu r...@media.berkeley.edu c...@iupui.edu c...@iupui.edu c...@iupui.edu There were 27 lines in the file with From as the first word Not in the output of a subset. Latest output: set(['stephen.marqu...@uct.ac.za', 'lo...@media.berkeley.edu', ' zq...@umich.edu', 'rjl...@iupui.edu', 'c...@iupui.edu', 'gsil...@umich.edu', 'wagne...@iupui.edu', 'antra...@caret.cam.ac.uk', ' gopal.ramasammyc...@gmail.com', 'david.horw...@uct.ac.za', ' r...@media.berkeley.edu']) ← Mismatch There were 54 lines in the file with From as the first word Latest revised code: fname = raw_input("Enter file name: ") if len(fname) < 1 : fname = "mbox-short.txt" fh = open(fname) count = 0 addresses = set() for line in fh: if line.startswith('From'): line2 = line.strip() line3 = line2.split() line4 = line3[1] addresses.add(line4) count = count + 1 print addresses print "There were", count, "lines in the file with From as the first word" Regards, Hal On Sat, Aug 1, 2015 at 5:45 PM, Emile van Sebille <em...@fenx.com> wrote: > On 8/1/2015 4:07 PM, Ltc Hotspot wrote: > >> Hi Alan, >> >> Question1: The output result is an address or line? >> > > It's a set actually. Ready to be further processed I imagine. Or to > print out line by line if desired. > > Question2: Why are there 54 lines as compared to 27 line in the desired >> output? >> > > Because there are 54 lines that start with 'From'. > > As I noted in looking at your source data, for each email there's a 'From > ' and a 'From:' -- you'd get the right answer checking only for > startswith('From ') > > Emile > > > > >> Here is the latest revised code: >> fname = raw_input("Enter file name: ") >> if len(fname) < 1 : fname = "mbox-short.txt" >> fh = open(fname) >> count = 0 >> addresses = set() >> for line in fh: >> if line.startswith('From'): >> line2 = line.strip() >> line3 = line2.split() >> line4 = line3[1] >> addresses.add(line4) >> count = count + 1 >> print addresses >> print "There were", count, "lines in the file with From as the first word" >> >> The output result: >> set(['stephen.marqu...@uct.ac.za', 'lo...@media.berkeley.edu', ' >> zq...@umich.edu', 'rjl...@iupui.edu', 'c...@iupui.edu', ' >> gsil...@umich.edu', >> 'wagne...@iupui.edu', 'antra...@caret.cam.ac.uk',' >> gopal.ramasammyc...@gmail.com', 'david.horw...@uct.ac.za', ' >> r...@media.berkeley.edu']) ← Mismatch >> There were 54 lines in the file with From as the first word >> >> >> The desired output result: >> stephen.marqu...@uct.ac.za >> lo...@media.berkeley.edu >> zq...@umich.edu >> rjl...@iupui.edu >> zq...@umich.edu >> rjl...@iupui.edu >> c...@iupui.edu >> c...@iupui.edu >> gsil...@umich.edu >> gsil...@umich.edu >> zq...@umich.edu >> gsil...@umich.edu >> wagne...@iupui.edu >> zq...@umich.edu >> antra...@caret.cam.ac.uk >> gopal.ramasammyc...@gmail.com >> david.horw...@uct.ac.za >> david.horw...@uct.ac.za >> david.horw...@uct.ac.za >> david.horw...@uct.ac.za >> stephen.marqu...@uct.ac.za >> lo...@media.berkeley.edu >> lo...@media.berkeley.edu >> r...@media.berkeley.edu >> c...@iupui.edu >> c...@iupui.edu >> c...@iupui.edu >> There were 27 lines in the file with From as the first word >> >> Regards, >> Hal >> >> >> >> >> >> >> >> >> >> On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <alan.ga...@btinternet.com> >> wrote: >> >> On 01/08/15 19:48, Ltc Hotspot wrote: >>> >>> There is an indent message in the revised code. >>>> Question: Where should I indent the code line for the loop? >>>> >>>> >>> Do you understand the role of indentation in Python? >>> Everything in the indented block is part of the structure, >>> so you need to indent everything that should be executed >>> as part of the logical block. >>> >>> fname = raw_input("Enter file name: ") >>> >>>> if len(fname) < 1 : fname = "mbox-short.txt" >>>> fh = open(fname) >>>> count = 0 >>>> addresses = set() >>>> for line in fh: >>>> if line.startswith('From'): >>>> line2 = line.strip() >>>> line3 = line2.split() >>>> line4 = line3[1] >>>> addresses.add(line) >>>> count = count + 1 >>>> >>>> >>> Everything after the if line should be indented an extra level >>> because you only want to do those things if the line >>> startswith From. >>> >>> And note that, as I suspected, you are adding the whole line >>> to the set when you should only be adding the address. >>> (ie line4). This would be more obvious if you had >>> used meaningful variable names such as: >>> >>> strippedLine = line.strip() >>> tokens = strippedLine.split() >>> addr = tokens[1] >>> addresses.add(addr) >>> >>> PS. >>> Could you please delete the extra lines from your messages. >>> Some people pay by the byte and don't want to receive kilobytes >>> of stuff they have already seen multiple times. >>> >>> >>> -- >>> Alan G >>> Author of the Learn to Program web site >>> http://www.alan-g.me.uk/ >>> http://www.amazon.com/author/alan_gauld >>> Follow my photo-blog on Flickr at: >>> http://www.flickr.com/photos/alangauldphotos >>> >>> >>> _______________________________________________ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> https://mail.python.org/mailman/listinfo/tutor >> >> > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor