John W. Krahn wrote:
> According to HTML::Entities
>
> # Some extra Latin 1 chars that are listed in the HTML3.2 draft
> (21-May-96)
> copy => '�', # copyright sign
> reg => '�', # registered sign
> nbsp => "\240", # non breaking space
Thanks, John, I had no idea where to look. I didn't know a non-breaking
space was an actual character, I thought it was just a directive to the
browser. I have corrected the code below accordingly and it prints "line
1line 3" as desired.
use strict;
use warnings;
use HTML::TokeParser;
my $p = HTML::TokeParser->new(*DATA) or die "Can't open: $!";
while (my $tag = $p->get_tag())
{
if ($tag->[0] eq "dd")
{
my $text = $p->get_trimmed_text();
$text =~ s/^[\s\240]*(.*?)[\s\240]*$/$1/;
print "$text";
}
}
__DATA__
<DD>line 1</DD>
<DD> </DD>
<DD>line 3</DD>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]