Edit report at https://bugs.php.net/bug.php?id=43771&edit=1

 ID:                 43771
 Comment by:         it dot registrations at symphony-group dot co dot uk
 Reported by:        missingno at ifrance dot com
 Summary:            validateOnParse & validate() give difference results
 Status:             Not a bug
 Type:               Bug
 Package:            DOM XML related
 Operating System:   Ubuntu 7.10
 PHP Version:        5.3CVS-2008-01-06 (snap)
 Block user comment: N
 Private report:     N

 New Comment:

I feel this is a bug, and it still exists as of PHP 5.3.8 on OS X 10.7.3 and 
PHP 
5.4.3 compiled on Fedora 15


Previous Comments:
------------------------------------------------------------------------
[2008-01-23 11:24:48] rricha...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

validation only works for XML data, so XHTML DOCTYPE needed.
Also, in order to use validateOnParse, you need to load via loadXML()

------------------------------------------------------------------------
[2008-01-06 17:18:38] missingno at ifrance dot com

Description:
------------
>From what I understand, DOMDocument::validateOnParse() = TRUE & 
>DOMDocument::validate() should return the same list of errors for a given 
>document.

But when dealing with HTML code, validate() seems to fail to deal with the DTD 
correctly.

Therefore, using validate() & validateOnParse gives different results.

I think that in the case of validate(), PHP calls libxml with options that make 
it think it will be dealing with XML code and thus, an XML DTD. So, once libxml 
loads the HTML DTD, it fails to parse it correctly and returns a bunch of 
errors.

(HTML & XML DTDs are similar, except that HTML ones allow for some more 
constructs like the '&' operator in ELEMENT declarations)

Reproduce code:
---------------
<?php
/* Note: removing the system identifier doesn't change the result,
 * except that "errors" in the DTD are caught immediately.
 * (My guess would be that libxml tries to fetch the DTD from the
 * system identifier instead of using a catalog for resolution) */
$markup = <<<HTML
<!DOCTYPE HTML PUBLIC
    "-//W3C//DTD HTML 4.0 Transitional//EN"
    "http://www.w3.org/TR/html4/loose.dtd";>
<html>
    <head><title>Foo</title></head>
    <body><p>Bar</p></body>
</html>
HTML;

header('Content-Type: text/plain');
libxml_use_internal_errors(TRUE);

// First, using DOMDocument::validateOnParse.
libxml_clear_errors();
$dd1 = new DOMDocument();
$dd1->validateOnParse      = TRUE;

echo "Using validateOnParse:\n";
$dd1->loadHTML($markup);
var_dump(libxml_get_errors());

echo "\n\n";

// Now, using DOMDocument::validate() instead.
libxml_clear_errors();
$dd2 = new DOMDocument();
$dd2->validateOnParse      = FALSE;

echo "Using validate():\n";
$dd2->loadHTML($markup);
$dd2->validate();
var_dump(libxml_get_errors());

?>

Expected result:
----------------
Using validateOnParse:
array(0) {
}


Using validate():
array(0) {
}


Actual result:
--------------
Using validateOnParse:
array(0) {
}


Using validate():
array(3) {
  [0]=>
  object(LibXMLError)#3 (6) {
    ["level"]=>
    int(3)
    ["code"]=>
    int(37)
    ["column"]=>
    int(1)
    ["message"]=>
    string(55) "xmlParseEntityDecl: entity HTML.Version not terminated
"
    ["file"]=>
    string(36) "http://www.w3.org/TR/html4/loose.dtd";
    ["line"]=>
    int(31)
  }
  [1]=>
  object(LibXMLError)#4 (6) {
    ["level"]=>
    int(3)
    ["code"]=>
    int(60)
    ["column"]=>
    int(1)
    ["message"]=>
    string(37) "Content error in the external subset
"
    ["file"]=>
    string(36) "http://www.w3.org/TR/html4/loose.dtd";
    ["line"]=>
    int(31)
  }
  [2]=>
  object(LibXMLError)#5 (6) {
    ["level"]=>
    int(2)
    ["code"]=>
    int(517)
    ["column"]=>
    int(0)
    ["message"]=>
    string(74) "Could not load the external subset 
"http://www.w3.org/TR/html4/loose.dtd";
"
    ["file"]=>
    string(0) ""
    ["line"]=>
    int(0)
  }
}



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=43771&edit=1

Reply via email to