Hi Emile,

I made a mistake and incorrectly assumed that differences between 54 lines
of output and 27 lines of output is the result of removing duplicate email
addresses, i.e., gsil...@umich.edu
gsil...@umich.edu, c...@iupui.edu, c...@iupui.edu


Apparently, this is not the case and I was wrong :(
The solution to the problem is in the  desired line output:

stephen.marqu...@uct.ac.za
lo...@media.berkeley.edu
zq...@umich.edu
rjl...@iupui.edu
zq...@umich.edu
rjl...@iupui.edu
c...@iupui.edu
c...@iupui.edu
gsil...@umich.edu
gsil...@umich.edu
zq...@umich.edu
gsil...@umich.edu
wagne...@iupui.edu
zq...@umich.edu
antra...@caret.cam.ac.uk
gopal.ramasammyc...@gmail.com
david.horw...@uct.ac.za
david.horw...@uct.ac.za
david.horw...@uct.ac.za
david.horw...@uct.ac.za
stephen.marqu...@uct.ac.za
lo...@media.berkeley.edu
lo...@media.berkeley.edu
r...@media.berkeley.edu
c...@iupui.edu
c...@iupui.edu
c...@iupui.edu
There were 27 lines in the file with From as the first word
Not in the output of a subset.

Latest output:
set(['stephen.marqu...@uct.ac.za', 'lo...@media.berkeley.edu', '
zq...@umich.edu', 'rjl...@iupui.edu', 'c...@iupui.edu', 'gsil...@umich.edu',
'wagne...@iupui.edu', 'antra...@caret.cam.ac.uk', '
gopal.ramasammyc...@gmail.com', 'david.horw...@uct.ac.za', '
r...@media.berkeley.edu']) ← Mismatch
There were 54 lines in the file with From as the first word

Latest revised code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
addresses = set()
for line in fh:
    if line.startswith('From'):
        line2 = line.strip()
        line3 = line2.split()
        line4 = line3[1]
        addresses.add(line4)
        count = count + 1
print addresses
print "There were", count, "lines in the file with From as the first word"

Regards,
Hal

On Sat, Aug 1, 2015 at 5:45 PM, Emile van Sebille <em...@fenx.com> wrote:

> On 8/1/2015 4:07 PM, Ltc Hotspot wrote:
>
>> Hi Alan,
>>
>> Question1: The output result is an address or line?
>>
>
> It's a set actually.  Ready to be further processed I imagine.  Or to
> print out line by line if desired.
>
> Question2: Why are there 54 lines as compared to 27 line in the desired
>> output?
>>
>
> Because there are 54 lines that start with 'From'.
>
> As I noted in looking at your source data, for each email there's a 'From
> ' and a 'From:' -- you'd get the right answer checking only for
> startswith('From ')
>
> Emile
>
>
>
>
>> Here is the latest revised code:
>> fname = raw_input("Enter file name: ")
>> if len(fname) < 1 : fname = "mbox-short.txt"
>> fh = open(fname)
>> count = 0
>> addresses = set()
>> for line in fh:
>>      if line.startswith('From'):
>>          line2 = line.strip()
>>          line3 = line2.split()
>>          line4 = line3[1]
>>          addresses.add(line4)
>>          count = count + 1
>> print addresses
>> print "There were", count, "lines in the file with From as the first word"
>>
>> The output result:
>> set(['stephen.marqu...@uct.ac.za', 'lo...@media.berkeley.edu', '
>> zq...@umich.edu', 'rjl...@iupui.edu', 'c...@iupui.edu', '
>> gsil...@umich.edu',
>> 'wagne...@iupui.edu', 'antra...@caret.cam.ac.uk','
>> gopal.ramasammyc...@gmail.com', 'david.horw...@uct.ac.za', '
>> r...@media.berkeley.edu']) ← Mismatch
>> There were 54 lines in the file with From as the first word
>>
>>
>> The desired output result:
>> stephen.marqu...@uct.ac.za
>> lo...@media.berkeley.edu
>> zq...@umich.edu
>> rjl...@iupui.edu
>> zq...@umich.edu
>> rjl...@iupui.edu
>> c...@iupui.edu
>> c...@iupui.edu
>> gsil...@umich.edu
>> gsil...@umich.edu
>> zq...@umich.edu
>> gsil...@umich.edu
>> wagne...@iupui.edu
>> zq...@umich.edu
>> antra...@caret.cam.ac.uk
>> gopal.ramasammyc...@gmail.com
>> david.horw...@uct.ac.za
>> david.horw...@uct.ac.za
>> david.horw...@uct.ac.za
>> david.horw...@uct.ac.za
>> stephen.marqu...@uct.ac.za
>> lo...@media.berkeley.edu
>> lo...@media.berkeley.edu
>> r...@media.berkeley.edu
>> c...@iupui.edu
>> c...@iupui.edu
>> c...@iupui.edu
>> There were 27 lines in the file with From as the first word
>>
>> Regards,
>> Hal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Aug 1, 2015 at 1:40 PM, Alan Gauld <alan.ga...@btinternet.com>
>> wrote:
>>
>> On 01/08/15 19:48, Ltc Hotspot wrote:
>>>
>>> There is an indent message in the revised code.
>>>> Question: Where should I indent the code line for the loop?
>>>>
>>>>
>>> Do you understand the role of indentation in Python?
>>> Everything in the indented block is part of the structure,
>>> so you need to indent everything that should be executed
>>> as part of the logical block.
>>>
>>> fname = raw_input("Enter file name: ")
>>>
>>>> if len(fname) < 1 : fname = "mbox-short.txt"
>>>> fh = open(fname)
>>>> count = 0
>>>> addresses = set()
>>>> for line in fh:
>>>>       if line.startswith('From'):
>>>>       line2 = line.strip()
>>>>       line3 = line2.split()
>>>>       line4 = line3[1]
>>>>       addresses.add(line)
>>>>       count = count + 1
>>>>
>>>>
>>> Everything after the if line should be indented an extra level
>>> because you only want to do those things if the line
>>> startswith From.
>>>
>>> And note that, as I suspected, you are adding the whole line
>>> to the set when you should only be adding the address.
>>> (ie line4). This would be more obvious if you had
>>> used meaningful variable names such as:
>>>
>>>      strippedLine = line.strip()
>>>      tokens = strippedLine.split()
>>>      addr = tokens[1]
>>>      addresses.add(addr)
>>>
>>> PS.
>>> Could you please delete the extra lines from your messages.
>>> Some people pay by the byte and don't want to receive kilobytes
>>> of stuff they have already seen multiple times.
>>>
>>>
>>> --
>>> Alan G
>>> Author of the Learn to Program web site
>>> http://www.alan-g.me.uk/
>>> http://www.amazon.com/author/alan_gauld
>>> Follow my photo-blog on Flickr at:
>>> http://www.flickr.com/photos/alangauldphotos
>>>
>>>
>>> _______________________________________________
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>>
>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to