From: olivier at oxeva dot fr Operating system: Linux PHP version: 5.3SVN-2009-09-23 (snap) PHP Bug Type: mbstring related Bug description: mbstring default behaviour is unexpected
Description: ------------ Default behaviour of mbstring concerning php files encoding is unexpected: If the source file is in UTF-8 (with BOM), the file is converted to latin1 (mbstring.internal_encoding has default value). Reproduce code: --------------- Here are all tests done on a file named "testmbstring.php" http://www.ajeux.com/phptests/testmbstring.phps (md5sum: e663d28964a20ec404e68226effc27d0) 1/ PHP 5.3.0 WITHOUT mbstring './configure' '--enable-zend-multibyte' (on a file without the var_dump()) Hexdump: 00000000 31 32 33 c3 a9 31 32 33 e2 82 ac 31 32 33 |123..123...123| 0000000e => result is OK, non-latin1 chars are coded on 2 bytes and 3 bytes 2/ PHP 5.3.0 WITH mbstring './configure' '--enable-zend-multibyte' '--enable-mbstring' a) no specific parameters on .ini php -i returned the following: mbstring.detect_order => no value => no value mbstring.encoding_translation => Off => Off mbstring.func_overload => 0 => 0 mbstring.http_input => pass => pass mbstring.http_output => pass => pass mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml) => ^(text/|application/xhtml\+xml) mbstring.internal_encoding => no value => no value mbstring.language => neutral => neutral mbstring.script_encoding => no value => no value mbstring.strict_detection => Off => Off mbstring.substitute_character => no value => no value Hexdump: 00000000 73 74 72 69 6e 67 28 31 30 29 20 22 49 53 4f 2d |string(10) "ISO-| 00000010 38 38 35 39 2d 31 22 0a 31 32 33 e9 31 32 33 3f |8859-1".123.123?| 00000020 31 32 33 |123| 00000023 => Default behaviour is to have latin1 as internal_encoding. Characters are no longer UTF-8 b) With mbstring.internal_encoding="UTF-8" php -i returned the following: mbstring.detect_order => no value => no value mbstring.encoding_translation => Off => Off mbstring.func_overload => 0 => 0 mbstring.http_input => pass => pass mbstring.http_output => pass => pass mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml) => ^(text/|application/xhtml\+xml) mbstring.internal_encoding => UTF-8 => UTF-8 mbstring.language => neutral => neutral mbstring.script_encoding => no value => no value mbstring.strict_detection => Off => Off mbstring.substitute_character => no value => no value Hexdump output: 00000000 73 74 72 69 6e 67 28 35 29 20 22 55 54 46 2d 38 |string(5) "UTF-8| 00000010 22 0a 31 32 33 c3 a9 31 32 33 e2 82 ac 31 32 33 |".123..123...123| 00000020 => OK c) With var_dump(mb_internal_encoding('UTF-8')) on top of file (and no changes in php.ini) php -i : output == 2a. 00000000 62 6f 6f 6c 28 74 72 75 65 29 0a 73 74 72 69 6e |bool(true).strin| 00000010 67 28 35 29 20 22 55 54 46 2d 38 22 0a 31 32 33 |g(5) "UTF-8".123| 00000020 e9 31 32 33 3f 31 32 33 |.123?123| 00000028 => In spite of the mb_internal_encoding(), the output is not utf-8. Expected result: ---------------- MBString should not change the source file encoding if there is no default internal_encoding specified by the user (mbstring.internal_encoding => no value). If this is expected, at least phpinfo() (php -i) should show to the user that default internal_encoding is latin1 (ISO-8859-1). Also, the mb_internal_encoding('UTF-8') function should work on current file (see test 2c). User should not be forced to change internal_encoding through php.ini (all users do not have access to it). Actual result: -------------- Versions tested with same behaviour : - PHP 5.3.0 - PHP 5.3.0 snap 200909230830 - PHP 5.2.3 - PHP 5.2.11 -- Edit bug report at http://bugs.php.net/?id=49638&edit=1 -- Try a snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=49638&r=trysnapshot52 Try a snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=49638&r=trysnapshot53 Try a snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=49638&r=trysnapshot60 Fixed in SVN: http://bugs.php.net/fix.php?id=49638&r=fixed Fixed in SVN and need be documented: http://bugs.php.net/fix.php?id=49638&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=49638&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=49638&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=49638&r=needscript Try newer version: http://bugs.php.net/fix.php?id=49638&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=49638&r=support Expected behavior: http://bugs.php.net/fix.php?id=49638&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=49638&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=49638&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=49638&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=49638&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=49638&r=dst IIS Stability: http://bugs.php.net/fix.php?id=49638&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=49638&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=49638&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=49638&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=49638&r=mysqlcfg