At 03:35 22.02.2003, Andy Crain said:
--------------------[snip]--------------------
>My apologies in advance if this too basic or there's a solution easily
>found out there, but after lots of searching, I'm still lost.
>
>I'm trying to build a regexp that would parse user-supplied text and
>identify cases where HTML tags are left open or are not properly
>matched-e.g., <b> tags without closing </b> tags. This is for a sort of
>message board type of application, and I'd like to allow users to use
>some HTML, but just would like to check to ensure that no stray tags are
>input that would screw up the rest of the page's display. I'm new to
>regular expressions, and the one below is as far as I've gotten. If
>anyone has any suggestions, they'd be very much appreciated.
>
>$suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote ";
>$pattern = '/<(' . $suspect_tags . '[^>]*>)(.*)(?!<\/\1)/Ui';
>if (preg_match($pattern,$_POST['entry'],$matches)) {
> //do something to report the unclosed tags
>} else {
> echo 'Input looks fine. No unmatched tags.';
>}
--------------------[snip]--------------------
Hi,
I don't believe you can create a regular expression to look for something
that's NOT there.
I'd take this approach (tested with drawbacks, see below):
function check_tags($text) {
$suspect_tags = "b|i|u|strong|em|font|a|ol|ul|blockquote";
$re_find = '/<\s*(' . $suspect_tags . ').*?>(.*)/is';
while (preg_match($re_find,$text,$matches)) {
// a suspect tag was found, check if closed
$suspect = $matches[1];
$text = $matches[2];
$re_close = '/<\s*\/\s*' . $suspect . '\s*?>(.*)/is';
if (preg_match($re_close, $text, $matches)) {
// fine, found matching closer, continue loop
$text = $matches[1];
}
else {
// not closed - return to report it
return $suspect;
}
}
return null;
}
$text = <<<EOT
This text contains < font
size=+4 > an
unclosed suspect </fint>tag.
EOT;
$tag = check_tags($text);
if ($tag) echo "Unmatched: \"$tag\"\n";
else echo "Perfect!\n";
The drawbacks: This approach is softly targeted at unintended typos, such
as in the example text. It won't catch deliberate attacks, such as
Blindtext <font color="red><font size=+22>Hehe I've got you</font>
because it is missing the second font opener. To catch these attacks you'd
need to build a source tree of the text in question.
HTH,
--
>O Ernest E. Vogelsinger
(\) ICQ #13394035
^ http://www.vogelsinger.at/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php