Edit report at https://bugs.php.net/bug.php?id=62010&edit=1
ID: 62010 Comment by: votefordevnull at gmail dot com Reported by: tklingenberg at lastflood dot net Summary: json_decode produces invalid byte-sequences Status: Open Type: Bug Package: JSON related Operating System: Windows PHP Version: 5.3.13 Block user comment: N Private report: N New Comment: Successfully reproduced on Linux Previous Comments: ------------------------------------------------------------------------ [2012-05-11 22:46:34] tklingenberg at lastflood dot net Looks like that #41067 https://bugs.php.net/bug.php?id=41067 was not fully fixed. ------------------------------------------------------------------------ [2012-05-11 22:12:42] tklingenberg at lastflood dot net Description: ------------ It's a typical case the JSON *and* UTF-16 specifications warn about: decoding of non-existing UTF-16 code-points: json_decode('"\ud834"') shoud give NULL because \ud834 is *invalid*. But instead it starts some party, get's boozed and offers this as UTF-8 byte-sequence: 1110 1101 1010 0000 1011 0100 1110 xxxx 10xx xxxx 10xx xxxx 1101 1000 0011 0100 D8 34 U+D834 is not a valid unicode character. Test script: --------------- if (NULL !== json_decode('"\ud834"')) { echo "json_decode is still broken."; } Expected result: ---------------- NULL because the json is invalid. Actual result: -------------- PHP tries to create UTF-8 out of it and fails by creating invalid UTF-8 unicode byte-sequences. ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=62010&edit=1