At 05:52 PM 3/14/2002 +0200, Ando Saabas wrote: >Ok let me explain my problem further some. I need the regular expression to >purify the html page from script tags: >I used: $file = eregi_replace("(<script(.*)>.*</script>)", " ", $file); >Now this works fine, until theres a webpage like: > ><script something>script data.</script> >Some webpage data ><script something>another script data </script> > >so the regexp above replaces everything between first <script > and last ></script> ie the webpage data also. >So i thought to change the regexp to something like this: $file = >eregi_replace("(<script(.*)>NOT(script)</script>)", " ", $file); >where NOT(script) would match everything that contains word script
I suspect that POSIX extended regular expression functions will not be sufficient to do what you want. Most likely you will need to use the PRCE functions (preg_replace, etc.) I tried to come up with a regex to do what you are looking for but it's beyond me. I think it may have something to do with what is called a "negative look ahead assertion", although I couldn't personally get it to work. You can read about negative look ahead assertions here: http://www.perldoc.com/perl5.6.1/pod/perlre.html You may be better off asking this question on a Perl newsgroup or mailing list... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php