From: "Jon Shoberg" <[EMAIL PROTECTED]> > I need to remove HTML scripts from some pages. > > I have to replace > > <*script*>*</*script*> with blanks. This includes all > javascript/vbscript > in between the tags > > I'm using the * as guidelines to show it must match several > variations. > > Thoughts ? Ideas? Suggestions?
use HTML::JFilter; # http://Jenda.Krynicky.cz/#HTML::JFilter my $filter_tags; open FILTER, '< DefaultAllowedHTML.txt'; # http://Jenda.Krynicky.cz/perl/DefaultAllowedHTML.txt {local $/;$filter_tags = <FILTER>;} close FILTER; my $filter = new HTML::JFilter $filter_tags, 'ssi'; $filter->doFILE($path_to_the_file, $path_to_output_file); # od $result = $filter->doSTRING( $source ); The module uses HTML::Parser to parse the HTML (therefore you should be fairly safe with it) and filters all tags and attributes NOT specified in the parameter to "new HTML::JFilter". For tags like <script> it removes the body with the tag. Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
