At 05:52 PM 3/14/2002 +0200, Ando Saabas wrote:
>Ok let me explain my problem further some. I need the regular expression to
>purify the html page from script tags:
>I used: $file = eregi_replace("(<script(.*)>.*</script>)", " ", $file);
>Now this works fine, until theres a webpage like:
>
><script something>script data.</script>
>Some webpage data
><script something>another script data </script>
>
>so the regexp above replaces everything between first <script > and last
></script> ie the webpage data also.
>So i thought to change the regexp to something like this:  $file =
>eregi_replace("(<script(.*)>NOT(script)</script>)", " ", $file);
>where NOT(script) would match everything that contains word script

I suspect that POSIX extended regular expression functions will not be 
sufficient to do what you want.  Most likely you will need to use the PRCE 
functions (preg_replace, etc.)  I tried to come up with a regex to do what 
you are looking for but it's beyond me.  I think it may have something to 
do with what is called a "negative look ahead assertion", although I 
couldn't personally get it to work.  You can read about negative look ahead 
assertions here:

http://www.perldoc.com/perl5.6.1/pod/perlre.html

You may be better off asking this question on a Perl newsgroup or mailing 
list...


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to