Hi, I am trying to figure out how to do something and frankly, don't know where to begin. I am using the perl module HTML::TokeParser to extract a list of urls and headlines. I then get rid of those headlines that are garbage, but several times a day the same story comes over with a different url but the same headline. I need only the latest version of the story, but how do I check for duplicate headlines and get rid of all but the first one in the list? My code thus far is: $ignoreItems = '^.*Schedule$|Bc-Fbc-|Eds:|^\(|By The'; use HTML::TokeParser; $p = HTML::TokeParser->new(shift||"testwires.htm"); while (my $token = $p->get_tag("a")) { my $url = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text("/a"); if ($text =~ $ignoreItems) { print ""; } else { print "$url\t$text\n"; } } which produces something like the following: 20000913-w4apf/f6590.html Former Dolphins Qb Strock Named Coach 20000913-w4apf/f6591.html Former Dolphins Qb Strock Named Coach 20000913-w4apf/k3225.html Illinois Qb An Example For Boller 20000913-w4apf/k3242.html Cardinals Revenge-Minded Any advice on how to check the $text variable for the previous entry and not print out th $url and $text if the previous entry for $text is the same? Any pointers, suggestions appreciated. Gary _______________________________________________ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list