ID: 46944 Updated by: scott...@php.net Reported By: anomie at users dot sourceforge dot net -Status: Verified +Status: Closed Bug Type: JSON related Operating System: Linux PHP Version: 5.3CVS-2008-12-26 (snap) Assigned To: scottmac New Comment:
This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. Previous Comments: ------------------------------------------------------------------------ [2008-12-26 15:39:44] anomie at users dot sourceforge dot net Description: ------------ json_encode encodes characters above U+1FFFF incorrectly; sometimes it incorrectly encodes them as characters in the U+10000-U+1FFFF range, and sometimes it just errors out. Note this is not an error with the source not being UTF8; as you can see below, I am building the UTF8-encoded text byte-by-byte. 5.2.6 has the same problem, although instead of null it returns "aa" for those cases due to bug 43941. It looks like there are actually two unrelated bugs here: 1. utf8_to_utf16 in ext/json/utf8_to_utf16.c should use "c -= 0x10000;" at line 49 instead of "c &= 0xFFFF;". This causes the part where it incorrectly encodes values over U+1FFFF as U+10000-U+1FFFF. 2. utf8_decode_next in ext/json/utf8_decode.c should use 0xF8 instead of 0xF1 at line 168. This causes the part where UTF8 characters beginning with an F1 or F3 byte error out. Reproduce code: --------------- for($i=1; $i<=16; $i++){ print json_encode("aa".chr(0xf0|($i>>2)).chr(0x8f|($i&3)<<4)."\xbf\xbdzz")."\n"; } Expected result: ---------------- "aa\ud83f\udffdzz" "aa\ud87f\udffdzz" "aa\ud8bf\udffdzz" "aa\ud8ff\udffdzz" "aa\ud93f\udffdzz" "aa\ud97f\udffdzz" "aa\ud9bf\udffdzz" "aa\ud9ff\udffdzz" "aa\uda3f\udffdzz" "aa\uda7f\udffdzz" "aa\udabf\udffdzz" "aa\udaff\udffdzz" "aa\udb3f\udffdzz" "aa\udb7f\udffdzz" "aa\udbbf\udffdzz" "aa\udbff\udffdzz" Actual result: -------------- "aa\ud83f\udffdzz" "aa\ud83f\udffdzz" "aa\ud83f\udffdzz" null null null null "aa\ud83f\udffdzz" "aa\ud83f\udffdzz" "aa\ud83f\udffdzz" "aa\ud83f\udffdzz" null null null null "aa\ud83f\udffdzz" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=46944&edit=1