Edit report at http://bugs.php.net/bug.php?id=51903&edit=1

 ID:             51903
 Patch added by: m...@php.net
 Reported by:    phpwnd at gmail dot com
 Summary:        simplexml_load_file() doesn't use HTTP headers
 Status:         Open
 Type:           Bug
 Package:        SimpleXML related
 PHP Version:    5.3.2

 New Comment:

The following patch has been added/updated:

Patch Name: check_stream-wrapperdata_for_encoding
Revision:   1274878924
URL:       
http://bugs.php.net/patch-display.php?bug=51903&patch=check_stream-wrapperdata_for_encoding&revision=1274878924


Previous Comments:
------------------------------------------------------------------------
[2010-05-25 07:16:56] phpwnd at gmail dot com

Description:
------------
Seen at http://stackoverflow.com/questions/2899274/



If you use simplexml_load_file() to load a remote document via HTTP,
SimpleXML assumes that the content is UTF-8 regardless of the HTTP
headers. In the test script below, at the time of writing, Google's web
server returns something like:



-------------

HTTP/1.1 200 OK

Content-Type: text/xml; charset=GB2312

Date: Tue, 25 May 2010 05:05:17 GMT

Pragma: no-cache

Expires: Fri, 01 Jan 1990 00:00:00 GMT

Cache-Control: no-cache, no-store, must-revalidate

expires=Thu, 24-May-2012 05:05:17 GMT; path=/; domain=.google.com

X-Content-Type-Options: nosniff

Server: igfe

X-XSS-Protection: 1; mode=block

Transfer-Encoding: chunked



<?xml version="1.0"?><xml_api_reply version="1">

<!-- single-byte encoded GB2312 stuff -->

</xml_api_reply>

-------------



The server advertises the content "text/xml; charset=GB2312", but since
the XML declaration doesn't mention the encoding, SimpleXML assumes it
is UTF-8 and eventually fails to load it.



If it is at all possible, SimpleXML (and DOM, I assume) should look at
the HTTP headers to find the document's encoding.

Test script:
---------------
simplexml_load_file('http://www.google.com/ig/api?weather=11791&hl=zh-CN');

Actual result:
--------------
PHP Warning:  simplexml_load_file():
http://www.google.com/ig/api?weather=11791&hl=zh-CN:1: parser error :
Input is not proper UTF-8, indicate encoding !

Bytes: 0xC7 0xE7 0x22 0x2F in Command line code on line 1



Warning: simplexml_load_file():
http://www.google.com/ig/api?weather=11791&hl=zh-CN:1: parser error :
Input is not proper UTF-8, indicate encoding !

Bytes: 0xC7 0xE7 0x22 0x2F in Command line code on line 1

PHP Warning:  simplexml_load_file(): t_system
data="SI"/></forecast_information><current_conditions><condition data="
in Command line code on line 1



Warning: simplexml_load_file(): t_system
data="SI"/></forecast_information><current_conditions><condition data="
in Command line code on line 1

PHP Warning:  simplexml_load_file():                                    
                                           ^ in Command line code on
line 1



Warning: simplexml_load_file():


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=51903&edit=1

Reply via email to