Edit report at http://bugs.php.net/bug.php?id=54028&edit=1
ID: 54028 User updated by: schmale at froglogic dot com Reported by: schmale at froglogic dot com Summary: Directory::read() cannot handle non-unicode chars properly Status: Bogus Type: Bug Package: Directory function related Operating System: Windows 7 PHP Version: 5.3.5 Block user comment: N Private report: N New Comment: Well, I don't know what Windows uses as encoding, but I sure do know, that it works properly with the Windows CGI version. The point is, a directory called 'Startmenü' will return 'Startmenü' with Linux/CGI, Linux/CLI, Windows/CGI, but NOT with Windows/CLI - the latter returning 'Startmenñæ' (or sth similar). In other words: The behaviour with Windows/CLI is broken, where the other versions return the exact name of the directory, as expected. So I think it has nothing (little) to do with unicode filesystem support or the encoding of Windows, but with differences between CGI and CLI. Previous Comments: ------------------------------------------------------------------------ [2011-02-15 16:54:17] paj...@php.net There is already a feature request for unicode filesystem support. Btw, Windows does not use UTF-8 for its encoding. ------------------------------------------------------------------------ [2011-02-15 16:51:20] schmale at froglogic dot com Description: ------------ Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI. Using dir('path/to/dir'), the read() method does not return UTF-8, if the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux and Windows, both CGI and CLI, and the problem does only occur with Windows/CLI. Test script: --------------- $path = 'path/to/directory/which/contains/umlauts'; $directory = dir($path); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { fprintf(STDERR, 'Returned non-utf-8 (%s)', $content); } } Expected result: ---------------- The expected result, of course, was that the return value of read is always encoded in UTF-8, i.e. no messages are print, when we run the script. Actual result: -------------- If a subdirectory contains umlauts (or I guess any non-unicode character), a message is print, i.e. the return value is not encoded in UTF-8. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=54028&edit=1