subject:"How to delete duplicate headlines in perl with HTML\:\:TokeParser"

Re: How to delete duplicate headlines in perl with HTML::TokeParser

2000-09-14 Thread Charles Galpin

Hi Gary A simple way would be to put them in a hash first before printing them, using $text as the key and $url as the value. Duplicates will dissapear, with the latest being kept. You could keep the first one by checking if it exists before inserting. If you need to preserve the order, then you

How to delete duplicate headlines in perl with HTML::TokeParser

2000-09-14 Thread Gary Nielson

Hi, I am trying to figure out how to do something and frankly, don't know where to begin. I am using the perl module HTML::TokeParser to extract a list of urls and headlines. I then get rid of those headlines that are garbage, but several times a day the same story comes over with a different url