Ernest, Thanks very much. This is pretty close to what I'm looking for. The only problem is that it doesn't catch nested tags. For example, "some <b>text </u>some</b> text some text" makes it through without error since, I think, preg_match resumes matching at the </b> after spotting and then checking its first match, at <b>. Andy
> -----Original Message----- > From: Ernest E Vogelsinger [mailto:[EMAIL PROTECTED] > Sent: Saturday, February 22, 2003 5:48 AM > To: Andy Crain > Cc: [EMAIL PROTECTED] > Subject: Re: [PHP] preg_match question: locating unmatched HTML tags > > At 03:35 22.02.2003, Andy Crain said: > --------------------[snip]-------------------- > >My apologies in advance if this too basic or there's a solution easily > >found out there, but after lots of searching, I'm still lost. > > > >I'm trying to build a regexp that would parse user-supplied text and > >identify cases where HTML tags are left open or are not properly > >matched-e.g., <b> tags without closing </b> tags. This is for a sort of > >message board type of application, and I'd like to allow users to use > >some HTML, but just would like to check to ensure that no stray tags are > >input that would screw up the rest of the page's display. I'm new to > >regular expressions, and the one below is as far as I've gotten. If > >anyone has any suggestions, they'd be very much appreciated. > > > >$suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote "; > >$pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui'; > >if (preg_match($pattern,$_POST['entry'],$matches)) { > > //do something to report the unclosed tags > >} else { > > echo 'Input looks fine. No unmatched tags.'; > >} > --------------------[snip]-------------------- > > Hi, > > I don't believe you can create a regular expression to look for something > that's NOT there. > > I'd take this approach (tested with drawbacks, see below): > > function check_tags($text) { > $suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote"; > $re_find = '/<\s*(' . $suspect_tags . ').*?>(.*)/is'; > > while (preg_match($re_find,$text,$matches)) { > // a suspect tag was found, check if closed > $suspect = $matches[1]; > $text = $matches[2]; > $re_close = '/<\s*\/\s*' . $suspect . '\s*?>(.*)/is'; > if (preg_match($re_close, $text, $matches)) { > // fine, found matching closer, continue loop > $text = $matches[1]; > } > else { > // not closed - return to report it > return $suspect; > } > } > return null; > } > > $text = <<<EOT > This text contains < font > size=+4 > an > unclosed suspect </fint>tag. > > EOT; > > $tag = check_tags($text); > if ($tag) echo "Unmatched: \"$tag\"\n"; > else echo "Perfect!\n"; > > The drawbacks: This approach is softly targeted at unintended typos, such > as in the example text. It won't catch deliberate attacks, such as > Blindtext <font color="red><font size=+22>Hehe I've got you</font> > because it is missing the second font opener. To catch these attacks you'd > need to build a source tree of the text in question. > > HTH, > > -- > >O Ernest E. Vogelsinger > (\) ICQ #13394035 > ^ http://www.vogelsinger.at/ > > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php