I have used this little routine to strip HTML. Might be ineffecient, I
don't know..
Assuming HTML has been loaded into variable $html
$html=~ s/\n//g;
$html=~ s/>/>\n/g;
@html=split(/\n/, $html);
foreach $_(@html)
{
$_=~ s/<.*>//g;
$newhtml.=$_;
}
print $newhtml;
Agustin Rivera
----- Original Message -----
From: "Etienne Marcotte" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, November 13, 2001 11:44 AM
Subject: YARQ (Yet Another Regexp Question)
> I saw somewhere on the web a good regexp for removing html tags. Can't
> re-find it and it needed some minor mods.
>
> Let's say the $line is 'this is a <font size="2">large word</font>in
> size 2';
>
> I played a little around, but it always removed between the first < and
> the last > (and I knwo the tutorial on the web said how to avoid this)
>
> I'd like to make something like this (I know this one's not good, but
> please help place parenthesis and [] and {} :)
>
> .* < (.*) \s .* > .* </ \1 > .*
> this is a < font size="2" > large word </ font > in size 2
>
> the above line show what is the match for each part...
>
> thanks for help...
>
> And also is tthe a way to specify a list of allowed tags? or a list of
> unallowed tags.
> like if the (.*) is foo or bar to delete, keep is something else...
>
> I don't think it's clear, but I'll try to help if you need more details
> on what I'm trying to accomplish
>
> Etienne
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]