RE: [PHP] preg_match question: locating unmatched HTML tags

Andy Crain Sat, 22 Feb 2003 13:55:42 -0800

Ernest,
Thanks very much. This is pretty close to what I'm looking for. The only
problem is that it doesn't catch nested tags. For example, "some <b>text
</u>some</b> text some text" makes it through without error since, I
think, preg_match resumes matching at the </b> after spotting and then
checking its first match, at <b>.
Andy


> -----Original Message-----
> From: Ernest E Vogelsinger [mailto:[EMAIL PROTECTED]
> Sent: Saturday, February 22, 2003 5:48 AM
> To: Andy Crain
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PHP] preg_match question: locating unmatched HTML tags
> 
> At 03:35 22.02.2003, Andy Crain said:
> --------------------[snip]--------------------
> >My apologies in advance if this too basic or there's a solution
easily
> >found out there, but after lots of searching, I'm still lost.
> >
> >I'm trying to build a regexp that would parse user-supplied text and
> >identify cases where HTML tags are left open or are not properly
> >matched-e.g., <b> tags without closing </b> tags. This is for a sort
of
> >message board type of application, and I'd like to allow users to use
> >some HTML, but just would like to check to ensure that no stray tags
are
> >input that would screw up the rest of the page's display. I'm new to
> >regular expressions, and the one below is as far as I've gotten. If
> >anyone has any suggestions, they'd be very much appreciated.
> >
> >$suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote ";
> >$pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui';
> >if (preg_match($pattern,$_POST['entry'],$matches)) {
> >   //do something to report the unclosed tags
> >} else {
> >   echo 'Input looks fine. No unmatched tags.';
> >}
> --------------------[snip]--------------------
> 
> Hi,
> 
> I don't believe you can create a regular expression to look for
something
> that's NOT there.
> 
> I'd take this approach (tested with drawbacks, see below):
> 
> function check_tags($text) {
>         $suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote";
>         $re_find = '/<\s*(' . $suspect_tags . ').*?>(.*)/is';
> 
>         while (preg_match($re_find,$text,$matches)) {
>                 // a suspect tag was found, check if closed
>                 $suspect = $matches[1];
>                 $text = $matches[2];
>                 $re_close = '/<\s*\/\s*' . $suspect . '\s*?>(.*)/is';
>                 if (preg_match($re_close, $text, $matches)) {
>                         // fine, found matching closer, continue loop
>                         $text = $matches[1];
>                 }
>                 else {
>                         // not closed - return to report it
>                         return $suspect;
>                 }
>         }
>         return null;
> }
> 
> $text = <<<EOT
> This text contains < font
>         size=+4 > an
>         unclosed suspect </fint>tag.
> 
> EOT;
> 
> $tag = check_tags($text);
> if ($tag) echo "Unmatched: \"$tag\"\n";
> else echo "Perfect!\n";
> 
> The drawbacks: This approach is softly targeted at unintended typos,
such
> as in the example text. It won't catch deliberate attacks, such as
>    Blindtext <font color="red><font size=+22>Hehe I've got you</font>
> because it is missing the second font opener. To catch these attacks
you'd
> need to build a source tree of the text in question.
> 
> HTH,
> 
> --
>    >O     Ernest E. Vogelsinger
>    (\)    ICQ #13394035
>     ^     http://www.vogelsinger.at/
> 
> 
> 
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP] preg_match question: locating unmatched HTML tags

Reply via email to