Edit report at http://bugs.php.net/bug.php?id=54028&edit=1
ID: 54028 Updated by: paj...@php.net Reported by: schmale at froglogic dot com Summary: Directory::read() cannot handle non-unicode chars properly -Status: Open +Status: Bogus Type: Bug Package: Directory function related Operating System: Windows 7 PHP Version: 5.3.5 Block user comment: N Private report: N New Comment: There is already a feature request for unicode filesystem support. Btw, Windows does not use UTF-8 for its encoding. Previous Comments: ------------------------------------------------------------------------ [2011-02-15 16:51:20] schmale at froglogic dot com Description: ------------ Notice: This problem does ONLY affect the CLI interpreter, NOT the CGI. Using dir('path/to/dir'), the read() method does not return UTF-8, if the directory contains e.g. umlauts (ä, ö, ü). I tested this on Linux and Windows, both CGI and CLI, and the problem does only occur with Windows/CLI. Test script: --------------- $path = 'path/to/directory/which/contains/umlauts'; $directory = dir($path); while (false !== ($content = $directory->read())) { if (mb_check_encoding($content, 'UTF-8') === false) { fprintf(STDERR, 'Returned non-utf-8 (%s)', $content); } } Expected result: ---------------- The expected result, of course, was that the return value of read is always encoded in UTF-8, i.e. no messages are print, when we run the script. Actual result: -------------- If a subdirectory contains umlauts (or I guess any non-unicode character), a message is print, i.e. the return value is not encoded in UTF-8. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=54028&edit=1