ID:               46944
 Updated by:       scott...@php.net
 Reported By:      anomie at users dot sourceforge dot net
-Status:           Verified
+Status:           Closed
 Bug Type:         JSON related
 Operating System: Linux
 PHP Version:      5.3CVS-2008-12-26 (snap)
 Assigned To:      scottmac
 New Comment:

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.




Previous Comments:
------------------------------------------------------------------------

[2008-12-26 15:39:44] anomie at users dot sourceforge dot net

Description:
------------
json_encode encodes characters above U+1FFFF incorrectly; sometimes it
incorrectly encodes them as characters in the U+10000-U+1FFFF range, and
sometimes it just errors out.

Note this is not an error with the source not being UTF8; as you can
see below, I am building the UTF8-encoded text byte-by-byte.

5.2.6 has the same problem, although instead of null it returns "aa"
for those cases due to bug 43941.

It looks like there are actually two unrelated bugs here:
1. utf8_to_utf16 in ext/json/utf8_to_utf16.c should use "c -= 0x10000;"
at line 49 instead of "c &= 0xFFFF;". This causes the part where it
incorrectly encodes values over U+1FFFF as U+10000-U+1FFFF.
2. utf8_decode_next in ext/json/utf8_decode.c should use 0xF8 instead
of 0xF1 at line 168. This causes the part where UTF8 characters
beginning with an F1 or F3 byte error out.

Reproduce code:
---------------
for($i=1; $i<=16; $i++){
    print
json_encode("aa".chr(0xf0|($i>>2)).chr(0x8f|($i&3)<<4)."\xbf\xbdzz")."\n";
}

Expected result:
----------------
"aa\ud83f\udffdzz"
"aa\ud87f\udffdzz"
"aa\ud8bf\udffdzz"
"aa\ud8ff\udffdzz"
"aa\ud93f\udffdzz"
"aa\ud97f\udffdzz"
"aa\ud9bf\udffdzz"
"aa\ud9ff\udffdzz"
"aa\uda3f\udffdzz"
"aa\uda7f\udffdzz"
"aa\udabf\udffdzz"
"aa\udaff\udffdzz"
"aa\udb3f\udffdzz"
"aa\udb7f\udffdzz"
"aa\udbbf\udffdzz"
"aa\udbff\udffdzz"


Actual result:
--------------
"aa\ud83f\udffdzz"
"aa\ud83f\udffdzz"
"aa\ud83f\udffdzz"
null
null
null
null
"aa\ud83f\udffdzz"
"aa\ud83f\udffdzz"
"aa\ud83f\udffdzz"
"aa\ud83f\udffdzz"
null
null
null
null
"aa\ud83f\udffdzz"



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=46944&edit=1

Reply via email to