ID: 47108 Comment by: typoon at gmail dot com Reported By: terrafr...@php.net Status: Open Bug Type: DOM XML related Operating System: Windows XP PHP Version: 5.2.8 New Comment:
The explanation to this might be the fact that ISO-8859-7 does not have the character 0xAE. When libxml tries to convert it, an error is thrown because of this. References: http://www.itscj.ipsj.or.jp/ISO-IR/227.pdf http://en.wikipedia.org/wiki/ISO_8859-7 Checking the PDF you will see 0xAE is not assigned. Quoting wikipedia: "Code values 001F, 7F, 809F, AE, D2 and FF are not assigned to characters by ISO/IEC 8859-7." More information and other reference can also be found on google. My 2 cents then are that this is not a bug at all. If you still think it is, the we might need to open a bug report for the libxml team as this is an error generated inside libxml, not PHP. Regards, Henrique Previous Comments: ------------------------------------------------------------------------ [2009-01-14 20:08:27] terrafr...@php.net Description: ------------ All HTML after chr(0xAE) (if present) is ignored by DOMDocument's loadHTML(), even if chr(0xAE) is a valid character per the HTML's charset. In the Reproduce code, replace chr(0xAE) with chr(0xAF) or chr(0xAD) or just remove it all together, and it works. Further, if you echo out $str and copy / paste the HTML into validator.w3.org, it's valid HTML, even with the chr(0xAE). Reproduce code: --------------- <?php $str = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=iso-8859-7"> <title>test</title> </head> <body><p>aaaaa' . chr(0xAE) . 'zzzzz</p></body> </html>'; $xml = new DOMDocument(); $xml->loadHTML($str); echo $xml->saveHTML(); Expected result: ---------------- aaaaa�zzzzz Actual result: -------------- Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0xAE 0x7A 0x7A 0x7A in C:\htdocs\test.php on line 14 Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0xAE 0x7A 0x7A 0x7A in C:\htdocs\test.php on line 14 Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlCheckEncoding: encoder error in Entity, line: 4 in C:\htdocs\test.php on line 14 Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0xAE 0x7A 0x7A 0x7A in C:\htdocs\test.php on line 14 aaaaa ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=47108&edit=1