ID: 44868 Updated by: j...@php.net Reported By: colourmusic at gmail dot com -Status: Open +Status: Feedback Bug Type: mbstring related Operating System: Win XP SP2 PHP Version: 6CVS-2008-04-30 (snap) New Comment:
Please try using this CVS snapshot: http://snaps.php.net/php6.0-latest.tar.gz For Windows: http://windows.php.net/snapshots/ Previous Comments: ------------------------------------------------------------------------ [2008-04-30 11:45:18] colourmusic at gmail dot com Description: ------------ I parsed url with UTF-8 encoding and noticed that UTF symbol 8 ( 8 = EF BC 98 code units) replaces to EF BC 5F code units that are not correct utf symbol. Script didn't generate errors and warnings. Also I noticed that utf symbols from 0 (0) to 7 (7) and 9 (9) parses by parse_url() without any problems. This bug also appears on PHP 5.2.3 and PHP 5.2.5 Reproduce code: --------------- <?php // mb_convert_encoding() provides same result as html_entity_decode() in this example //$url = mb_convert_encoding("https://example.com/?SHAMEI=ランドクルーザー80バン&SHAMEI_CD=01465,", "utf-8", "html-entities"); $url = html_entity_decode("https://example.com/?SHAMEI=ランドクルーザー90バン&SHAMEI_CD=01465,",null,"utf-8"); echo "Original URL = $url <br />\n"; $result = parse_url($url); echo print_r($result); ?> Expected result: ---------------- Original URL = https://example.com/?SHAMEI=ランドクルーザー80バン&SHAMEI_CD=01465, Array ( [scheme] => https [host] => example.com [path] => / [query] => SHAMEI=ランドクルーザー80バン&SHAMEI_CD=01465, ) Actual result: -------------- Original URL = https://example.com/?SHAMEI=ランドクルーザー80バン&SHAMEI_CD=01465, Array ( [scheme] => https [host] => example.com [path] => / [query] => ランドクルーザー�_0バン&SHAMEI_CD=01465, ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=44868&edit=1