On Sun, Jun 6, 2010 at 10:39 PM, Raymond Irving <xwis...@gmail.com> wrote:

> Hello,
>
> I'm experiencing another issue when attempting to use
> DOMDocument::loadXML()
> to load the following HTML code:
>
> <?php
> $html = '
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
> <html>
>    <body>
>        <script type="text/javascript">
>            <!--
>            var i = 0, html = "<strong>Bold Text</strong>,Normal Text";
>            document.write(html);
>            i--; // this line causes the parser to fail
>            alert(html);
>            -->
>        </script>
>    </body>
> </html>';
> $dom = new DOMDocument();
> $dom->loadXML($html);
> echo $dom->saveHTML();
> ?>
>
> The parser throws the following error when it encounters "i--" in inside
> the
> <script> tag:
>
> Warning: DOMDocument::loadXML() [domdocument.loadxml]: Comment not
> terminated <!-- var i = 0, html = "<strong>Bold Text< in Entity
>
> If I remove the like "i--" it will load the HTML code just fine.
>
> Any ideas as to why this throws an error?
>
> __
> Raymond
>


A comment declaration starts with "<!", and ends with ">", with any number
of comments following the form --comment-- in between:
http://htmlhelp.com/reference/wilbur/misc/comment.html

You'll see at the bottom of the article that they advocate a simple rule in
comments:
An HTML comment begins with "<!--", ends with "-->" and does not contain "--"
or ">" anywhere in the comment.

The occurrence of "i--" breaks that rule.

In your case, if you're maintaining the pages, you can place the javascript
in a separate file or place the javascript in a CDATA section.  If you're
parsing pages you don't maintain, you can rip out the javascript before
performing DOM tasks and parse it separately as needed to avoid potential
issues.

Adam

-- 
Nephtali:  PHP web framework that functions beautifully
http://nephtaliproject.com

Reply via email to