Scott Taylor <mailto:[EMAIL PROTECTED]> wrote:
: Is there a better, maybe more eligant, way to do this? I don't
: mind to use HTML::Parser if I could only figure out how.
use HTML::TokeParser;
my $html = q(
This is a line of HTML:people write strange things here<br>
and hardly ever follow proper<p>
syntax A&B suck at spelling as well<br>
So I need to clean it up and strip out all<br>
words less then 3 characters in length.<p>
Later the words will go into an indexer for<br>
searching a database
);
my $p = HTML::TokeParser->new( \$html );
while (my $token = $p->get_token) {
my $string = $p->get_trimmed_text;
$string = "\n$string" if $token->[1] eq 'br';
$string = "\n$string" if $token->[1] eq 'p';
print $string;
}
__END__
HTH,
Charles K. Clarkson
--
Mobile Homes Specialist
254 968-8328
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>