Edit report at http://bugs.php.net/bug.php?id=54028&edit=1
ID: 54028 Comment by: carsten_sttgt at gmx dot de Reported by: schmale at froglogic dot com Summary: Directory::read() cannot handle non-unicode chars properly Status: Bogus Type: Bug Package: Directory function related Operating System: Windows 7 PHP Version: 5.3.5 Block user comment: N Private report: N New Comment: > Windows supports UCS-2 internally via the wild char APIs. I now... I'm just wondering why: "mb_detect_encoding($content)" is returing 'UTF-8' and "mb_check_encoding($content, 'UTF-8')" is returning FALSE? Also I think there is another problem: | C:\Users\Carsten Wiedmann>php -r "echo realpath('.');" | C:\Users\Carsten Wiedmann | C:\Users\Carsten Wiedmann>cd Startmenü | | C:\Users\Carsten Wiedmann\Startmenü>php -r "echo realpath('.');" | | C:\Users\Carsten Wiedmann\Startmenü> Regards, Carsten Previous Comments: ------------------------------------------------------------------------ [2011-02-25 13:32:49] paj...@php.net There is no UTF-8 support in Windows APIs or in PHP for the file system APIs. Windows supports UCS-2 internally via the wild char APIs. PHP relies on the ANSI APIs and the encoding is then the runtime encoding (whatever is set for the running process or system wild). The feature request I was referring to is about making PHP uses the wild char API and accepts UTF-8 as input (and output). ------------------------------------------------------------------------ [2011-02-25 13:29:15] carsten_sttgt at gmx dot de | and the problem does only occur with Windows/CLI. I have no difference between CGI and CLI (both executed from the shell) Of course, something is courious: <?php $directory = dir(getenv('USERPROFILE')); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { printf('Returned non-utf-8 (%s)', $content); printf(" Encoding: %s\r\n", mb_detect_encoding($content)); } } ?> And the output is: Returned non-utf-8 (Startmenü) Encoding: UTF-8 Regards, Carsten ------------------------------------------------------------------------ [2011-02-15 17:10:43] schmale at froglogic dot com Well, I don't know what Windows uses as encoding, but I sure do know, that it works properly with the Windows CGI version. The point is, a directory called 'Startmenü' will return 'Startmenü' with Linux/CGI, Linux/CLI, Windows/CGI, but NOT with Windows/CLI - the latter returning 'Startmenñæ' (or sth similar). In other words: The behaviour with Windows/CLI is broken, where the other versions return the exact name of the directory, as expected. So I think it has nothing (little) to do with unicode filesystem support or the encoding of Windows, but with differences between CGI and CLI. ------------------------------------------------------------------------ [2011-02-15 16:54:17] paj...@php.net There is already a feature request for unicode filesystem support. Btw, Windows does not use UTF-8 for its encoding. ------------------------------------------------------------------------ [2011-02-15 16:51:20] schmale at froglogic dot com Description: ------------ Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI. Using dir('path/to/dir'), the read() method does not return UTF-8, if the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux and Windows, both CGI and CLI, and the problem does only occur with Windows/CLI. Test script: --------------- $path = 'path/to/directory/which/contains/umlauts'; $directory = dir($path); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { fprintf(STDERR, 'Returned non-utf-8 (%s)', $content); } } Expected result: ---------------- The expected result, of course, was that the return value of read is always encoded in UTF-8, i.e. no messages are print, when we run the script. Actual result: -------------- If a subdirectory contains umlauts (or I guess any non-unicode character), a message is print, i.e. the return value is not encoded in UTF-8. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=54028&edit=1