Edit report at http://bugs.php.net/bug.php?id=51903&edit=1
ID: 51903 Patch added by: m...@php.net Reported by: phpwnd at gmail dot com Summary: simplexml_load_file() doesn't use HTTP headers Status: Open Type: Bug Package: SimpleXML related PHP Version: 5.3.2 New Comment: The following patch has been added/updated: Patch Name: check_stream-wrapperdata_for_encoding Revision: 1274878924 URL: http://bugs.php.net/patch-display.php?bug=51903&patch=check_stream-wrapperdata_for_encoding&revision=1274878924 Previous Comments: ------------------------------------------------------------------------ [2010-05-25 07:16:56] phpwnd at gmail dot com Description: ------------ Seen at http://stackoverflow.com/questions/2899274/ If you use simplexml_load_file() to load a remote document via HTTP, SimpleXML assumes that the content is UTF-8 regardless of the HTTP headers. In the test script below, at the time of writing, Google's web server returns something like: ------------- HTTP/1.1 200 OK Content-Type: text/xml; charset=GB2312 Date: Tue, 25 May 2010 05:05:17 GMT Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Cache-Control: no-cache, no-store, must-revalidate expires=Thu, 24-May-2012 05:05:17 GMT; path=/; domain=.google.com X-Content-Type-Options: nosniff Server: igfe X-XSS-Protection: 1; mode=block Transfer-Encoding: chunked <?xml version="1.0"?><xml_api_reply version="1"> <!-- single-byte encoded GB2312 stuff --> </xml_api_reply> ------------- The server advertises the content "text/xml; charset=GB2312", but since the XML declaration doesn't mention the encoding, SimpleXML assumes it is UTF-8 and eventually fails to load it. If it is at all possible, SimpleXML (and DOM, I assume) should look at the HTTP headers to find the document's encoding. Test script: --------------- simplexml_load_file('http://www.google.com/ig/api?weather=11791&hl=zh-CN'); Actual result: -------------- PHP Warning: simplexml_load_file(): http://www.google.com/ig/api?weather=11791&hl=zh-CN:1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC7 0xE7 0x22 0x2F in Command line code on line 1 Warning: simplexml_load_file(): http://www.google.com/ig/api?weather=11791&hl=zh-CN:1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC7 0xE7 0x22 0x2F in Command line code on line 1 PHP Warning: simplexml_load_file(): t_system data="SI"/></forecast_information><current_conditions><condition data=" in Command line code on line 1 Warning: simplexml_load_file(): t_system data="SI"/></forecast_information><current_conditions><condition data=" in Command line code on line 1 PHP Warning: simplexml_load_file(): ^ in Command line code on line 1 Warning: simplexml_load_file(): ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=51903&edit=1