ID:               47108
 Comment by:       typoon at gmail dot com
 Reported By:      terrafr...@php.net
 Status:           Open
 Bug Type:         DOM XML related
 Operating System: Windows XP
 PHP Version:      5.2.8
 New Comment:

The explanation to this might be the fact that ISO-8859-7 does not have
the character 0xAE. When libxml tries to convert it, an error is thrown
because of this.
References:
http://www.itscj.ipsj.or.jp/ISO-IR/227.pdf
http://en.wikipedia.org/wiki/ISO_8859-7

Checking the PDF you will see 0xAE is not assigned.
Quoting wikipedia:
"Code values 00–1F, 7F, 80–9F, AE, D2 and FF are not assigned to
characters by ISO/IEC 8859-7."

More information and other reference can also be found on google.
My 2 cents then are that this is not a bug at all.
If you still think it is, the we might need to open a bug report for
the libxml team as this is an error generated inside libxml, not PHP.

Regards,

Henrique


Previous Comments:
------------------------------------------------------------------------

[2009-01-14 20:08:27] terrafr...@php.net

Description:
------------
All HTML after chr(0xAE) (if present) is ignored by DOMDocument's
loadHTML(), even if chr(0xAE) is a valid character per the HTML's
charset.  In the Reproduce code, replace chr(0xAE) with chr(0xAF) or
chr(0xAD) or just remove it all together, and it works.  Further, if you
echo out $str and copy / paste the HTML into validator.w3.org, it's
valid HTML, even with the chr(0xAE).

Reproduce code:
---------------
<?php
$str = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd";>
<html>
<head>
<meta http-equiv="content-type" content="text/html;
charset=iso-8859-7">
<title>test</title>
</head>
<body><p>aaaaa' . chr(0xAE) . 'zzzzz</p></body>
</html>';

$xml = new DOMDocument();
$xml->loadHTML($str);
echo $xml->saveHTML();

Expected result:
----------------
aaaaa&#65533;zzzzz

Actual result:
--------------
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: input
conversion failed due to input error, bytes 0xAE 0x7A 0x7A 0x7A in
C:\htdocs\test.php on line 14

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: input
conversion failed due to input error, bytes 0xAE 0x7A 0x7A 0x7A in
C:\htdocs\test.php on line 14

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]:
htmlCheckEncoding: encoder error in Entity, line: 4 in
C:\htdocs\test.php on line 14

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: input
conversion failed due to input error, bytes 0xAE 0x7A 0x7A 0x7A in
C:\htdocs\test.php on line 14

aaaaa


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=47108&edit=1

Reply via email to