Edit report at http://bugs.php.net/bug.php?id=54028&edit=1
ID: 54028 Comment by: carsten_sttgt at gmx dot de Reported by: schmale at froglogic dot com Summary: Directory::read() cannot handle non-unicode chars properly Status: Bogus Type: Bug Package: Directory function related Operating System: Windows 7 PHP Version: 5.3.5 Block user comment: N Private report: N New Comment: | and the problem does only occur with Windows/CLI. I have no difference between CGI and CLI (both executed from the shell) Of course, something is courious: <?php $directory = dir(getenv('USERPROFILE')); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { printf('Returned non-utf-8 (%s)', $content); printf(" Encoding: %s\r\n", mb_detect_encoding($content)); } } ?> And the output is: Returned non-utf-8 (Startmenü) Encoding: UTF-8 Regards, Carsten Previous Comments: ------------------------------------------------------------------------ [2011-02-15 17:10:43] schmale at froglogic dot com Well, I don't know what Windows uses as encoding, but I sure do know, that it works properly with the Windows CGI version. The point is, a directory called 'Startmenü' will return 'Startmenü' with Linux/CGI, Linux/CLI, Windows/CGI, but NOT with Windows/CLI - the latter returning 'Startmenñæ' (or sth similar). In other words: The behaviour with Windows/CLI is broken, where the other versions return the exact name of the directory, as expected. So I think it has nothing (little) to do with unicode filesystem support or the encoding of Windows, but with differences between CGI and CLI. ------------------------------------------------------------------------ [2011-02-15 16:54:17] paj...@php.net There is already a feature request for unicode filesystem support. Btw, Windows does not use UTF-8 for its encoding. ------------------------------------------------------------------------ [2011-02-15 16:51:20] schmale at froglogic dot com Description: ------------ Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI. Using dir('path/to/dir'), the read() method does not return UTF-8, if the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux and Windows, both CGI and CLI, and the problem does only occur with Windows/CLI. Test script: --------------- $path = 'path/to/directory/which/contains/umlauts'; $directory = dir($path); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { fprintf(STDERR, 'Returned non-utf-8 (%s)', $content); } } Expected result: ---------------- The expected result, of course, was that the return value of read is always encoded in UTF-8, i.e. no messages are print, when we run the script. Actual result: -------------- If a subdirectory contains umlauts (or I guess any non-unicode character), a message is print, i.e. the return value is not encoded in UTF-8. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=54028&edit=1